WHITE PAPER 
 
 
 
 
 
	  
Shifting	  the	  goalposts—from	  high	  
impact	  journals	  to	  high	  impact	  data	  
Anja	  Gassner1,	  Luz	  Marina	  Alvare2,	  Zoumana	  Bamba3,	  Douglas	  Beare4,	  Marichu	  
Bernardo5,	  Chandrashekhar	  Biradar6,	  Martin	  van	  Brakel7,	  Robert	  Chapman8,	  Guntuku	  
Dileepkumar9,	  Ibnou	  Dieng10,	  Sufiet	  Erlita11,	  Richard	  Fulss12,	  Jane	  Poole13,	  Mrigesh	  
Kshatriya11,	  Guvener	  Selim14	  Reinhard	  Simon14,	  Kai	  Sonder12,	  Nilam	  Prasai2,	  Maria	  
Garruccio8,	  Simone	  Staiger	  Rivas14,	  Maya	  Rajasekharan14	  ,	  Chukka	  Srinivasa	  Rao9	  	  
	  
	  
	  
	  
1	  World	  Agroforestry	  Centre	  (ICRAF),	  United	  Nations	  Avenue,	  Gigiri,	  PO	  Box	  30677	  Nairobi	  00100	  Kenya	  
2	  International	  Food	  Policy	  Research	  Institute	  (IFPRI),	  2033	  K	  St,	  NW,	  Washington,	  DC	  20006-­‐1002	  USA	  
3	  International	  Institute	  of	  Tropical	  Agriculture	  (IITA),	  HQ-­‐PMB	  5320,	  Ibadan,	  Oyo	  State	  Nigeria	  
4	  WorldFish,	  Jalan	  Batu	  Maung,	  Batu	  Maung,	  11960	  Bayan	  Lepas,	  Penang,	  Malaysia,	  	  PO	  Box	  500	  GPO,	  10670	  Penang,	  
Malaysia	  
5	  International	  Rice	  Research	  Institute	  (IRRI),	  DAPO	  Box	  7777	  Metro	  Manila	  1301,	  Los	  Baños,	  Philippines	  
6	  International	  Center	  for	  Agricultural	  Research	  in	  the	  Dry	  Areas	  (ICARDA,	  "Dalia	  Building	  2nd	  Floor,	  Bashir	  El	  Kassar	  
Street,	  Verdun,	  Beirut,	  Lebanon	  1108-­‐2010	  P.O.	  Box	  114/5055	  Beirut,	  	  Lebanon"	  
7	  International	  Water	  Management	  Institute	  (IWMI),	  P.	  O.	  Box	  2075,	  Colombo,	  Sri	  Lanka,	  127,	  Sunil	  Mawatha,	  Pelawatte,	  
Battaramulla,	  Sri	  Lanka	  
8	  Bioversity	  International,	  "HQ-­‐	  Via	  dei	  Tre	  Denari	  472/a	  00057	  Maccarese	  (Fiumicino)	  Rome,	  Italy"	  
9	  International	  Crops	  Research	  Institute	  for	  the	  Semi-­‐Arid	  Tropics	  (ICRISAT),	  Patancheru	  502	  324	  Andhra	  Pradesh,	  India	  
10	  Africa	  Rice	  Center	  (AfricaRice)	  01	  B.P.	  2031,	  Cotonou,	  Benin	  
11	  Center	  for	  International	  Forestry	  Research	  (CIFOR),	  HQ-­‐	  Jalan	  CIFOR,	  Situ	  Gede	  
Bogor	  (Barat)	  16115,	  Indonesia,	  Mailing-­‐P.O.	  Box	  0113	  BOCBD	  Bogor	  16000,	  Indonesia"	  
12	  International	  Maize	  and	  Wheat	  Improvement	  Center	  (CIMMYT),	  Km.	  45,	  Carretera,	  México-­‐Veracruz,	  El	  Batán,	  Texcoco	  
CP	  56130,	  Edo.	  de	  México,	  Apdo.	  Postal	  6-­‐641,	  06600	  Mexico,	  D.F.,	  Mexico	  
13	  International	  Livestock	  Research	  Institute	  (ILRI),	  HQ-­‐PO	  Box	  30709,	  Nairobi,	  00100	  ,	  Old	  Naivasha	  Road,	  ,	  Nairobi,	  
Kenya	  
14	  International	  Potato	  Center	  (CIP),	  Avenida	  La	  Molina	  1895,	  La	  Molina,	  Apartado	  Postal	  1558,	  Lima,	  Peru	  
15International	  Center	  for	  Tropical	  Agriculture	  (CIAT),	  Km	  17,	  Recta	  Cali-­‐Palmira	  
Apartado	  Aéreo	  6713,	  Cali,	  Colombia	  
	   	  
	  
Contents	  
	  
Introduction	  ....................................................................................	  3	  
Types	  of	  research	  data	  ....................................................................	  4	  
Research	  data	  infrastructure	  across	  the	  centers	  ............................	  6	  
Success	  stories	  ..............................................................................	  15	  
Barriers	  to	  mainstreaming	  data	  sharing	  .......................................	  18	  
Recommendations	  ........................................................................	  21	  
Key	  references	  ..............................................................................	  24	  
	   	  
PAGE	  -­‐	  2	  -­‐	  
Introduction	  
CGIAR	   is	  a	  global	  agriculture	  research	  partnership.	   Its	  science	   is	  carried	  out	  by	  
the	   15	   research	   centers	   who	   are	   members	   of	   the	   CGIAR	   Consortium	   in	  
collaboration	   with	   hundreds	   of	   partner	   organizations.	   Each	   member	   of	   the	  
CGIAR	  Consortium	  has	   the	  mandate	   to	   contribute	   to	   the	   eradication	   of	   hunger	  
and	   poverty	   at	   the	   global	   level	   by	   advancing	   research	   in	   development.	   Being	  
funded	  mainly	   through	  public	   funds	   there	   is	  an	  obligation	   to	  extract	  maximum	  
public	   good	   value	   from	   the	   research	   data	   the	   individual	   centers	   are	   collecting.	  
With	   the	   exception	   of	   a	   few	   designated	   projects,	   in	   the	   past,	   data	   itself	   was	  
merely	   seen	  as	  a	  means	   to	  an	  end,	   synthesized	   to	  produce	   selected	  knowledge	  
products	  such	  as	  publications,	  technical	  manuals	  or	  policy	  briefs.	  
Changing	   financial	   realities	   and	   the	   need	   to	   reach	   our	   aid	   targets	   with	   less	  
available	   resources	   is	   leading	   to	   a	   shift	   in	   thinking	   for	   new	   ways	   to	   make	  
scientific	  research,	  and	  research	  data,	  more	  available,	  reusable	  and	  reproducible.	  
Sufficiently	   preserved	   and	   replicable	   data	   are	   more	   absolute	   than	  
contemporaneously	  drawn	  conclusions	  and,	  if	  they	  are	  collected	  to	  address	  one	  
scientific	   question,	   can	   later	   be	   applied	   for	   the	   solution	   of	   entirely	   different	  
problems	  (Altman	  et	  al.,	  2006).	  
A	   growing	   number	   of	   donors	   now	   require	   data	   sharing	   requirements	   in	   their	  
standard	  grant	  contracts	  and	  open	  access	   to	  be	  applied	   to	   the	  research	  data	  of	  
the	  work	  they	  fund.	  	  DFID	  and	  the	  European	  Commission	  are	  among	  those	  who	  
are	   now	   adopting	   this	   policy.	   As	   one	   of	   the	   first	   custodians	   of	   development	  
related	  data,	   the	  World	  Bank,	   launched	   an	   open	  data	   initiative	   in	   2012.	   CGIAR	  
itself	   followed	   this	  example	  and	  approved	   in	  March	  2012	  a	  policy	   that	  clarifies	  
CGIAR	   research	   results,	   including	   data,	   are	   openly	   available	   and	   accessible	   by	  
default	   –	   and	   that	   any	   deviation	   has	   to	   be	   justified.	   	   Open	   data	   is	   not	   a	   new	  
concept	  for	  CGIAR,	  but	  in	  the	  past	  has	  largely	  remained	  linked	  to	  specific	  projects	  
or	  core	  programs	  that	  are	  designed	  to	  collect	  and	  share	  data	  widely.	  	  
In	  keeping	  with	   the	   changes	   to	   the	  amount,	  management	  and	  analyses	  of	  data,	  
the	   role	   of	   scientists	   working	   in	   development,	   such	   as	   those	   at	   the	   CGIAR	  
Centers,	   is	   changing.	   Previously,	   grant	   awards,	   and	   performance	   evaluations	  
focused	   strongly	  on	  publication	   in	  high-­‐impact	   journals.	   Fortunately,	   there	   is	   a	  
rediscovery	  that	  the	  real	  currency	  of	  research	  and	  scientific	  knowledge	  –the	  data	  
methods	   and	   ideas-­‐	   are	   also	   important	   instruments,	   in	   their	   own	   right,	   to	  
accelerate	  the	  impact.	  Our	  scientists	  are	  increasingly	  using	  their	  skills	  and	  their	  
strategic	   advantage	   of	   being	  well	   integrated	   in	   farming	   communities	   to	   create	  
well	  designed	  datasets	  for	  others	  to	  use.	  While	  there	  is	  no	  doubt	  that	  progress	  on	  
development	  related	  research	  questions	  would	  accelerate	  with	  increased	  access	  
to	   well	   documented	   and	   readily	   accessible	   data,	   the	   move	   also	   puts	   more	  
pressure	  on	  the	  gatherers,	  compilers,	  analysts	  and	  keepers	  of	  the	  data.	  Because	  
of	  previous	  emphasis	  on	  scientific	  publications	  only,	  research	  projects	  usually	  do	  
not	  budget	  for	  the	  extra	  costs	  that	  occur	  for	  publishing	  and	  long-­‐term	  storage	  of	  
data.	   While	   projects	   adequately	   account	   for	   the	   costs	   of	   conducting	   research,	  
including	   the	   collection	   and	   analysis	   of	   research	   data,	   they	   seldom	   include	  
preparing	  of	  data	  and	  metadata	  and	  curation	  as	  part	  of	  the	  costs	  of	  the	  research	  
process	  led	  alone	  long-­‐term	  costs	  for	  data	  storage	  and	  preservation	  beyond	  the	  
immediate	   lifetime	   of	   a	   project.	   Even	  more	   importantly	   the	   research	  nature	   of	  
CGIAR	  makes	  it	  mandatory	  to	  ensure	  that	  the	  contributions	  of	  those	  individuals	  
and	  organizations	  are	  recognized,	  whose	  reputations	  and	  careers	  depend	  on	  that	  
recognition.	   Data	   generators	   have	   little	   opportunities	   to	   gain	   recognition	   from	  
PAGE	  -­‐	  3	  -­‐	  
publishing	   high	   value	   datasets	   under	   a	   performance	   evaluation	   system	   that	  
strongly	  emphases	  scientific	  publications.	  	  
The	   purpose	   of	   this	   white	   paper	   is	   to	   provide	   an	   overview	   of	   the	   ongoing	  
initiatives	  at	  center	   level	  to	  respond	  to	  changing	  public	  expectations	  and	  to	  the	  
challenge	  of	   improving	   the	   conduct	   of	   science	  by	  making	   research	  data	  widely	  
available.	  We	  also	  attempt	  to	  provide	  a	  framework	  for	  implementing	  open	  access	  
for	  research	  data	  to	  maximize	  CGIAR’s	  impact	  on	  development.	  The	  remainder	  of	  
this	   paper	   proceeds	   as	   follows;	   firstly	   a	   summary	   of	   the	   diversity	   of	   research	  
data	  produced	  by	   the	   centers	   is	   given,	   followed	  by	  an	  overview	  of	   the	  existing	  
infrastructure	   for	   data	   management	   for	   each	   Center.	   Secondly,	   some	   of	   the	  
limitations	   and	   barriers	   faced	   by	   the	   centers	   in	   their	   process	   to	   mainstream	  
research	   data	   publishing	   are	   addressed.	   The	   paper	   concludes	   with	  
recommendations	  for	  how	  these	  limitations	  and	  barriers	  can	  be	  tackled.	  
Types	  of	  research	  data	  	  
In	   close	   collaboration	   with	   national	   research	   institutions	   almost	   10,000	  
scientists,	   researchers,	   technicians,	   are	   collecting,	   analyzing	   and	   synthesizing	  
data	   on	   smallholder	   agricultural	   systems	   in	   Asia,	   Africa	   and	   Latin	   America.	  	  
While	   the	   overall	  mandate	   of	   the	   CGIAR	   Centers	   is	   the	   same,	   each	   Center	   has	  
their	   own	   research	   emphasis	   resulting	   in	   a	   vast	   variety	   of	   different	   kinds	   of	  
research	   data.	   	   The	   following	   section	   does	   not	   attempt	   to	   give	   an	   exhaustive	  
description	   of	   the	   data	   we	   produce,	   but	   rather	   to	   provide	   an	   overview	   of	   the	  
main	  types	  and	  their	  characteristics.	  
Long	  term	  trials	  
Centers	  that	  are	  focusing	  on	  breeding	  of	  food	  crops	  and	  livestock	  need	  multiple	  
location	  trials	  over	  many	  years	  before	  a	  new	  genotype	  can	  be	  fully	  evaluated,	  so	  
do	   agroforestry	   and	   forestry	   trials	   that	   are	   working	   with	   slow	   growing	  
perennials.	   Centers	   working	   more	   generally	   on	   combined	   systems	   research,	  
natural	   resource	  management	   and	   climate	   change	   also	   have	   research	   covering	  
decades	  in	  order	  to	  characterize	  and	  influence	  changes	  in	  these	  systems.	  	  	  These	  
kinds	   of	   data	   are	   often	   collected	   in	   various	   consecutive	   projects	   and	   technical	  
and	  scientific	  staff	  that	  collect	  and	  analyze	  the	  data	  can	  change.	  From	  a	  scientific	  
point	   of	   view	   it	   is	   not	   desirable	   or	   useful	   to	   make	   the	   data	   available	   before	  
meaningful	  intermediate	  results	  have	  been	  collected.	  	  
One-­‐off	  data	  collections	  
Some	  data	  are	  project	  specific	  and	  aimed	  at	  answering	  specific	  research	  and/or	  
development	   related	   questions,	   often	   analyzed	   and	   published	   as	   part	   of	   the	  
project	   deliverables.	   Publication	   should	   be	   written	   within	   a	   reasonable	  
timeframe	   after	   which	   the	   data	   can	   be	   publically	   released	   together	   with	   the	  
publication.	  What	  is	  a	  reasonable	  timeframe	  does	  not	  only	  vary	  from	  researcher	  
to	  researcher,	  but	  by	  discipline.	  Economics	  publishing	  moves	  much	  slower	  than	  
other	  fields.	  It	  is	  not	  uncommon	  for	  a	  paper	  to	  reach	  publication	  5-­‐10	  years	  after	  
the	  original	  data	  was	  collected.	  
PAGE	  -­‐	  4	  -­‐	  
Baseline	  data	  
Baseline	   data,	   either	   household	   surveys	   or	   biophysical	   surveys	   used	   either	   for	  
basic	   characterization	   of	   a	   new	  project	   site,	   or	   for	   an	   impact	   evaluation	   of	   the	  
project.	   The	   data	   are	   collected	   as	   part	   of	   the	   deliverables	   of	   the	   project,	   but	  
budget	  constraints	  seldom	  allow	  sufficient	  sample	  size	  or	  rigid	  sampling	  designs	  
to	  allow	  the	  use	  of	  the	  data	  for	  peer-­‐	  reviewed	  publications.	  	  These	  data	  usually	  
get	  analyzed	  and	  results	  presented	  in	  donor	  reports.	  For	  these	  kinds	  of	  data	  the	  
curation	  costs	  are	  actually	  very	  high	  as	  they	  are	  not	  accompanied	  by	  a	  scientific	  
publication	   that	  provide	   the	  necessary	  metadata	  and	  methodologies	  are	   stored	  
in	  various	  versions	  on	  private	  computers.	  	  The	  data	  has	  limited	  usability	  for	  the	  
project	  itself,	  but	  baseline	  data	  sets	  combined	  from	  multiple	  projects	  potentially	  
form	  high	  information	  assets	  for	  the	  institutes	  and	  the	  public.	  There	  is	  a	  tradeoff	  
between	  releasing	  the	  data	  to	  do	  comparative	  analysis	  or	  to	  wait	  until	  the	  follow	  
up	  study	  is	  completed.	  
Genomic	  data	  
Some	   Centers	   are	   involved	   in	   genome	   sequencing	   projects,	   specifically	   in	  
generating	   the	   NGS	   (next	   generation	   sequencing)	   data	   for	   gene-­‐phenotype	  
association	   studies.	   These	   studies	   can	   generate	   multi-­‐terabases	   of	   sequencing	  
data.	  One	  of	  the	  key	  challenges	  is	  to	  devise	  scalable	  and	  robust	  data	  management	  
and	   data	   sharing	   solutions.	   	   High-­‐performance	   computing	   and	   storage	   are	  
required	  to	  efficiently	  process	  data	  generated	  by	  NGS.	  Bioinformatics	  support	  is	  
integral	   to	   address	   data	   management	   systems	   dealing	   with	   efficient	   storage,	  
retrieval,	   data	  mining,	  data	  analysis	   and	  making	  data	  available	   to	   the	  public	   at	  
the	  appropriate	  time.	  
Data	  collected	  as	  part	  of	  a	  research	  thesis	  
Various	   projects	   have	   a	   specific	   capacity	   building	   requirement,	   whereby	  
postgraduate	   students	   collect	   a	   substantial	   part	   of	   the	   data.	   	   The	   data	   is	   to	   be	  
used	   in	   peer-­‐reviewed	   publications	   as	   partial	   requirement	   for	   their	   degrees.	  	  
Data	  can	  only	  be	  publically	  released	  after	  the	  student	  has	  published	  their	  papers	  
or	  their	  thesis.	  	  
Value-­‐	  added	  secondary	  datasets	  
A	  large	  proportion	  of	  the	  work	  of	  a	  CGIAR	  scientist	  is	  to	  review	  and	  analyze	  pre-­‐
existing	   data	   that	  was	   not	   gathered	   or	   collected	   by	   the	   authors	   of	   the	   current	  
research	  project.	  Usually	  it	  has	  been	  collected	  by	  another	  organization	  or	  source	  
or	  data	  collected	  from	  government	  publications.	  Typical	  secondary	  datasets	  that	  
are	  used	  are	  meteorological	  datasets,	  remotely	  sensed	  data	  often	  in	  the	  form	  of	  
satellite	  images	  or	  aerial	  photographs,	  panel	  data	  sets	  for	  rural	  households.	  	  The	  
secondary	   datasets	   are	   shared	   with	   CGIAR	   scientists	   under	   specific	   user	  
agreements	   or	   licenses	   and	   cannot	   be	   publically	   shared	   by	   the	   scientists.	   Data	  
products	  derived	  from	  these	  data	  can	  be	  shared	  however,	  without	  the	  raw	  data.	  
Spatial	  data	  
While	   most	   data	   generated	   in	   the	   CGIAR	   research	   activities	   has	   a	   spatial	  
component	  (with	  exception	  of	  pure	  lab	  analysis	  not	  related	  to	  specific	  locations)	  
the	   GIS	   units	   of	   the	   individual	   centers	   collect,	   transform	   and	   generate	   a	   large	  
amount	  of	  spatial	  data.	  These	  fall	  into	  three	  categories:	  
PAGE	  -­‐	  5	  -­‐	  
a. Spatial	  data	   (in	   form	  of	  polygons	  and	   raster,	   satellite	   imagery)	  obtained	  
from	   other	   parties	   like	   NASA,	   JRC,	   National	   Geographical	   Institutes,	  
Universities	  and	  other	  research	  institutions,	  NGOs,	  private	  companies	  etc.	  
This	  is	  used	  for	  analysis,	  mapping,	  targeting	  within	  the	  work	  of	  the	  units.	  
Often	   this	   original	   data	   is	   curated	   and	   improved	   and	   represents	   a	   new	  
international	   public	   good	   and	   falls	   under	   value-­‐added	   secondary	  
datasets.	  	  
b. Geo	   referenced	  data	   generated	   by	   projects	   and	   the	  GIS	   units	  within	   the	  
centers.	   This	   can	   be	   any	   other	   form	   of	   data	   collected	   such	   as	   socio	  
economic	  surveys,	  soil	  samples,	  germplasm	  collections	  etc.	  It	  is	  then	  often	  
combined	  with	  other	  spatial	  data	  sets	  as	  mentioned	  under	  a)	  and	  utilized	  	  
for	  further	  analysis.	  
c. Statistical	  	  and	  other	  related	  data	  (climate	  from	  stations)	  that	  is	  collected	  
from	  national	  and	  other	  institutions	  such	  as	  subnational	  crop	  production	  
data	  or	  poverty	  statistics.	  This	  is	  georeferenced,	  often	  extrapolated	  and	  if	  
necessary	   further	   disaggregated	   and	   converted	   into	   common	   GIS	   data	  
formats	   and	   made	   available	   to	   the	   public	   and	   other	   centers	   again	   as	  
value-­‐added	  secondary	  datasets.	  	  
Data	  collected	  in	  a	  private	  public	  partnership	  project	  
When	   working	   with	   private	   companies	   some	   of	   the	   data	   and	   information	   is	  
highly	   sensitive	   and	   usually	   the	   centers	   sign	   confidentiality	   agreements	   that	  
state	   explicitly	  what	   the	   data	   can	   be	   used	   for	   and	  which	   data	   products	   can	   be	  
made	  publically	  available.	  
R	  &	  D	  Datasets	  
CGIAR	   has	   programs	   that	   are	   designed	   and	   funded	   purposely	   to	   create	   public	  
databases	   on	   key	   agricultural	   indicators.	   	   The	   databases	   are	   dynamic	   and	   are	  
updated	   on	   regular	   intervals.	   Technical	   and	   scientific	   staff	   that	   collect	   and	  
analyze	   the	   data	   can	   change.	   Here	   only	   aggregated	   data	   are	   made	   available	  
together	  with	  the	  methods.	  
Partner	  data	  
Most	  of	  our	  projects	  are	  done	  together	  with	  partners	  that	  sometimes	  contribute	  
their	  own	  non-­‐CGIAR	  funded	  data.	  Here	  it	  is	  important	  that	  the	  partner	  decides	  
what	  the	  data	  should	  be	  used	  for	  and	  who	  should	  have	  access	  to	  the	  data.	  Often	  
the	  partner	  has	  only	  agreed	  to	  joint	  copyright	  of	  data	  products,	  but	  not	  the	  actual	  
data	  sets.	  	  
Research	  data	  infrastructure	  across	  the	  centers	  
While	   publishing	   research	   data	   as	   a	   research	   output	   is	   nothing	   new	   for	   the	  
CGIAR	   Centers,	   in	   the	   past	   it	   was	   confined	   to	   flagship	   programs	   and	   projects.	  	  
Data	   sharing	   agreements	   and	   data	   management	   policies	   were	   developed	   and	  
data	   managers	   and	   curators	   hired	   based	   on	   the	   individual	   needs	   of	   these	  
projects.	  Centers	  did	  not	  have	  the	  infrastructure,	  capacity	  and	  incentive	  to	  make	  
more	  of	   their	   research	  data	  widely	  available.	   In	  early	  2008	  a	   first	   attempt	  was	  
made	  by	  the	  Alliance	  Deputy	  Executive	  (ADE)	  to	  review	  data	  management	  across	  
the	   centers	   and	  explore	  mechanisms	   for	   strengthening	   collective	  action	  among	  
PAGE	  -­‐	  6	  -­‐	  
centers	   (Anon,	   2008).	   A	   lot	   has	   changed	   since	   then	   and	   Center	  wide	   research	  
data	   management	   policies	   are	   becoming	   the	   norm	   rather	   than	   the	   exception.	  
Increasing	   investment	   in	   research	   support	   units	   and	   or	   staff	   and	   the	  
implementation	   of	   data	   archives	   reflect	   the	   change	   of	   mind	   set	   within	   center	  
management	  as	  well	  scientists.	  	  Most	  centers	  have	  already	  or	  are	  in	  the	  process	  
of	  setting-­‐up	  'OAI-­‐compliant'1	  data	  repositories,	  which	  allow	  the	  easy	  harvest	  of	  
metadata	  from	  one	  repository	  to	  another.	   	  Table	  1	  provides	  a	  brief	  summary	  of	  
the	  current	   research	  data	   infrastructure	  across	   centers.	  Details	   for	  each	  Center	  
are	  given	  below.	  
	  
Table	   1:	   Overview	   of	   existing	   infrastructure	   for	   research	   data	   management	   &	  
bioinformatics	  across	  the	  different	  institutes.	  
Research	   Centralized	  
data	   Data	   Data	  
Management	   Management	   Geoinformatics	   Biometrics	   Archiving	  &	  
Centre	   Policy	   Unit	   Unit	   Unit	   sharing	  
Africa	  Rice	   YES	   YES	   YES	   YES	   Since	  July	  2012	  
Bioversity	   In	  process	   YES	   	   	   Since	  Sept.	  2013	  
CIAT	  	   YES	   YES	   YES	   	   In	  process	  
CIFOR	   YES	   	   	   	   In	  process	  
CIMMYT	   YES	   Recruiting	   YES	   YES	   In	  process	  
CIP	   YES	   YES	   YES	   YES	   YES	  
ICARDA	   In	  process	   YES	   YES	   YES	   In	  process	  
ICRAF	   YES	   YES	   YES	   YES	   Since	  2011	  
ICRISAT	   In	  process	   YES	   YES	   YES	   YES	  
IFPRI	   YES	   Recruiting	   No	   	   Since	  2005	  
IITA	   In	  process	   In	  process	   YES	   YES	   In	  process	  
Partial	  
(shared	  
ILRI	   YES	   YES	   YES	   YES	   servers,	  data	  
portal	  in	  
development)	  
YES	  
IRRI	   (Currently	  being	   YES	   YES	   YES	   In	  process	  	  
updated)	  
IWMI	   YES	   	   YES	   	   YES	  
World	  Fish	   YES	   YES	   YES	   	   YES	  
	  
	   	  
	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  
1	  The	  Open	  Archives	  Initiative	  (OAI)	  develops	  and	  promotes	  interoperability	  standards	  (Protocol	  
for	  Metadata	  Harvesting)	  that	  aim	  to	  facilitate	  the	  efficient	  dissemination	  of	  content.	  	  
PAGE	  -­‐	  7	  -­‐	  
Africa	  Rice	  
AfricaRice	  research	  support	  unit,	  the	  Data	  Integration	  and	  Biometrics	  Unit	  (DIB	  
Unit)	  is	  responsible	  for	  data	  management	  and	  to	  assist	  AfricaRice	  research	  staff	  
(and	   partners)	   to	   enhance	   the	   efficiency	   and	   efficacy	   of	   data	   management	  
processes:	   i)	   data	   acquisition,	   quality	   control	   and	   storage:	   backstopping	   of	  
AfricaRice	  research	  staff	  in	  the	  management	  of	  their	  experimental	  data,	  including	  
automating	  data	   gathering	  procedures	   and	  quality	   control;	   ii)	   data	   integration,	  
analysis	  and	  visualization:	  biometrics’	  advice	  and	  support	  to	  AfricaRice	  research	  
staff	   on	   experimental	   designs,	   GxE	   analyses,	   mapping	   quantitative	   trait	   loci	  
(QTLs),	  analyzing	  genetic	  diversity,	  development	  of	  decision	  support	  tools,	  etc.	  In	  
addition,	   the	   DIB	   unit	   provides	   institute-­‐wide	   data	   standards,	   including	   a	  
common	   vocabulary	   and	   the	   use	   of	   standardized	   formats	   for	   primary	   and	  
metadata.	   It	   enables	   AfricaRice	   and	   partners	   to	   more	   easily	   integrate,	  
synchronize	  and	  consolidate	  data	   from	  different	  programs,	   exchange	  data	  with	  
other	  organizations	   in	  a	   common	   format,	   and	  communicate	  effectively	   through	  
shared	  terms	  and	  reporting	  formats.	  
	  The	  Unit	  consists	  of	  two	  statisticians	  (one	  IRS	  and	  one	  Support	  Staff),	  one	  data	  
manager	  (Support	  Staff)	  and	  one	  Consultant	  (Support	  Staff).	  Africa	  Rice	  is	  using	  
Dataverse	  as	  its	  research	  data	  archive.	  It	  was	  released	  on	  July	  2012	  and	  consists	  
of	  23	  studies	  as	  of	  today.	  AfricaRice	  developed	  a	  Data	  Management	  and	  Sharing	  
Policy	  in	  1999.	  Revised	  in	  2013,	  the	  document	  is	  in	  the	  process	  for	  approbation	  
by	  the	  AfricaRice	  Board	  of	  Trustees.	  
Bioversity	  
Bioversity	   has	   been	   engaged	   in	   developing	   the	   CGIAR	  Open	   Access	   policy	   and	  
guidelines	  together	  with	  other	  CGIAR	  Centers.	  Once	  approved,	  these	  documents	  
will	  form	  the	  basis	  for	  Bioversity’s	  ‘Open	  Access	  and	  Data	  Management	  Plan’.	  We	  
have	  already	  drafted	  this	  plan	  based	  on	  the	  current	  version	  of	  the	  OA	  policy,	  and	  
we	  hope	  to	  have	  the	  policy	  approved	  in	  early	  2014.	  	  
The	  responsibility	  for	  the	  collection	  and	  management	  of	  research	  data	  is	  with	  the	  
Research	  Planning	   and	  Monitoring	  Unit	   at	  Bioversity.	   	   The	  Library	   staff	  within	  
this	  unit	  are	  responsible	  for	  1)	  collecting	  the	  datasets;	  2)	  ensuring	  completeness	  
and	   accuracy	   of	   the	   associated	   metadata;	   3)	   finding/linking	   the	   datasets	   with	  
associated	  publications	  and	  tools,	  and	  4)	  inputting/releasing	  it	  on	  the	  Dataverse	  
network.	   	  No	  extra	  personnel	  have	  been	  recruited	  for	  these	  tasks	  so	  far.	  Whilst	  
Bioversity	  does	  not	  have	  a	  GIS	  unit,	  there	  are	  3	  GIS	  specialists	  based	  in	  regional	  
offices	  and	  HQ.	  	  
Bioversity	   recently	   established	   an	   open	   access	   dataset	   repository	   on	   the	  
Dataverse	  (Harvard)	  Platform.	  	  	  At	  present	  we	  have	  datasets	  available	  there	  from	  
research	  carried	  out	  in	  2012,	  and	  we	  are	  currently	  collecting	  other	  datasets	  from	  
previous	  years.	  At	  present,	  the	  metadata	  of	  each	  dataset	  is	  available	  immediately	  
but	  the	  actual	  datasets	  are	  only	  released	  once	  the	  scientist	  gives	  their	  approval.	  	  
In	  order	  to	  collect	  datasets	  systematically	  Bioversity	  has	  in	  place	  focal	  points	  in	  
its	  5	  research	  programs	  that	  assist	  the	  Library	  staff	  primarily	  with	  collecting	  the	  
datasets	  from	  their	  scientific	  staff.	  In	  collaboration	  with	  partners	  Bioversity	  also	  
manages	  a	  number	  of	  specialized	  databases	  and	  datasets	  relating	  to	  agricultural	  
and	  forest	  biodiversity	  such	  as:	  
	  
	  
PAGE	  -­‐	  8	  -­‐	  
• Collecting	  Mission	  database	  (http://bioversity.github.io/geosite/)	  	  
• Musa	  Germplasm	  Information	  System	  (MGIS)	  	  (http://www.crop-­‐
diversity.org/banana/)	  
• EURISCO	  	  (http://eurisco.ecpgr.org/nc/home_page.html)	  
• New	  World	  Fruit	  Database	  (http://nwfdb.bioversityinternational.org/)	  
Bioversity,	   always	   in	   collaboration	   with	   its	   partners,	   also	   develops	   data	  
standards	  for	  documentation	  and	  protocols	  to	  enable	  information	  sharing.	  	  
The	   Descriptor	   Lists	   publications	   that	   are	   published	   by	   Bioversity,	   assist	  
researchers	  and	  genebank	  curators	  to	  improve	  their	  capacity	  to	  describe,	  store,	  
manage	   and	   share	   information	   about	   plant	   resources,	   whether	   stored	  
in	  genebanks	  or	   growing	   in	   their	   natural	   environments	   (e.g.	   FAO/Bioversity	  
International,	  2012).	  
Center	  for	  International	  Forestry	  Research	  (CIFOR)	  
CIFOR’s	   Director	   General	   approved	   and	   promulgated	   a	   Research	   Data	  
Management	  Policy	   and	  Guidelines	   and	  Procedures	   in	   July	   2013.	   These	   identify	  
project	   managers	   as	   having	   primary	   responsibility	   for	   management,	   including	  
archiving,	  of	  research	  data;	  require	  preparation	  and	  implementation	  of	  Research	  
Data	  Management	  Plans;	  and	  identify	  the	  Center’s	  Data	  and	  Information	  Services	  
Unit	  as	  having	  responsibility	  for	  ensuring	  storage	  of	  and	  access	  to	  archived	  data,	  
with	  support	   from	  the	  GIS	  Unit	   for	  spatial	  data.	  CIFOR	   is	  currently	  recruiting	  a	  
Data	  Librarian	   to	  play	  a	  primary	   role	   in	   supporting	  project	  managers	  and	  staff	  
implementing	  the	  Policy.	  	  
International	  Potato	  Center	  (CIP)	  
Around	  September	  2002,	  CIPs	  director	  of	   research	   implemented	  a	   service	  unit	  
charged	   with	   building	   and	   maintaining	   the	   institutional	   'memory'	   of	   research	  
data	  (todays	  rough	  equivalent	  of	  KM	  on	  scientific	  data).	  To	  this	  end	  it	  provides	  
both	   centralized	   archiving,	   documentation	   and	   databasing	   services	   as	   well	   as	  
support	  to	  researchers	  in	  data	  analysis	  and	  visualization.	  This	  includes	  also	  the	  
de-­‐novo	   production	   of	   databases	   and	   tools	   as	   appropriate	   versus	   re-­‐using	  
existing	   software.	   The	   latter	   aspect	   was	   seen	   as	   intrinsic	   to	   the	   concept	   of	   a	  
'memory'	   since	   no	  memory	   'makes	   sense'	  without	   'data'	   and	   'data	   processing'	  
tools.	  Since	  the	  beginning,	  the	  unit	  has	  been	  involved	  in	  developing	  community	  
data	   documentation	   standards	   as	   well	   as	   in	   the	   application	   of	   open-­‐source	  
software	  principles	  and	  the	  promotion	  of	  open	  access.	  The	  unit	  was	  called	  at	  the	  
time	  'research	  informatics	  unit	  (RIU)'	  and	  recently	  renamed	  to	  'Integrated	  IT	  and	  
Computational	  Research.	  The	  unit	  provides	  services	  in	  documentation,	  database	  
design	  and	  development,	  PC-­‐based	  and	  mobile	  application	  development,	  GIS	  and	  
molecular	   bioinformatics.	   The	   unit	   has	   been	   part	   of	   former	   CGIAR	   KM-­‐related	  
activities	  like:	  the	  IPD	  (the	  Intergenebank	  Potato	  Database,	  a	  database	  to	  cross-­‐
reference	   potato	   collections	   and	   share	   passport	   and	   evaluation	   data,	   since	  
~1995),	  the	  SINGER	  community	  as	  a	  repository	  of	  long-­‐term	  data	  on	  germplasm	  
(since	  end	  of	  1990ies;	  now	  moving	  towards	  Genesys-­‐2	  software	  from	  the	  Global	  
Trust),	   the	   GenerationCP	   on	   breeding	   materials	   including	   the	   'composite	  
genotype	  sets',	  and	  the	  ICT-­‐KM	  initiative.	  
PAGE	  -­‐	  9	  -­‐	  
International	  Center	  for	  Agricultural	  Research	  in	  the	  Dry	  Areas	  (ICARDA)	  
ICRADA	   has	   a	   Geoinformatics	   Unit	   for	   all	   the	   geospatial	   database	   and	   related	  
products	   in	   coordination	   with	   Biometrics,	   GRS,	   CODIS	   etc.	   At	   present	   data	   is	  
stored	  in	  several	  individual	  archives	  and	  the	  institute	  is	  in	  the	  process	  of	  setting	  
up	   centralized	  data	   archiving	   facility	   for	   collecting,	   streamlining,	   archiving	   and	  
sharing,	   supported	   by	   a	   research	   data	   management	   policy.	   Less	   than	   10%	   of	  
research	  data	  is	  currently	  accessible	  through	  the	  public	  domain.	  
International	  Center	  for	  Tropical	  Agriculture	  (CIAT)	  
CIAT	  strongly	  believes	  that	  open	  access	  to	  data	  produced	  by	  Staff	  and	  partners	  
strategically	   supports	   our	   mission	   and	   ensures	   transparency	   and	   equity	   in	  
exploitation	   of	   the	   opportunities	   created.	  	   Even	   before	   implementing	   a	   Data	  
Management	   policy	   in	   February	   2012,	   CIAT	   led	   sharing	   of	   geo-­‐spatial	   data	  
together	  with	  associated	  information,	  at	  different	  scale	  in	  an	  organized,	  standard	  
and	   consistent	  way.	  With	   the	   implementation	   of	   the	   Data	  management	   policy,	  
CIAT	  is	  in	  the	  process	  of	  putting	  40	  years	  of	  data	  from	  the	  institutional	  memory	  
(includes	  phenotypic,	  genotypic,	  climatic,	  spatial	  and	  socio-­‐economic	  data)	  in	  the	  
public	   domain.	  	   The	   work	   involves	   compilation	   of	   data	   from	   the	   institutional	  
memory,	   cleaning	   and	   re-­‐formatting,	   respecting	   current	   data	   ontologies	   and	  
standards,	  and	  publication	  through	  appropriate	  mechanisms.	  	  
While	   CIAT	   encourage	   the	   data	   sharing	   culture,	   data	   management	   function	   is	  
currently	   spread	   across	   research	   areas	   and	   different	   units.	   The	   information	  
technology	   unit	   is	   in	   charge	   of	   data	   storage.	   Starting	   2014,	   we	   expect	   to	  
consolidate	   some	   of	   the	   ongoing	   activities	   under	   Library	   and	   information	  
management.	  In	  addition,	  CIAT	  also	  hosts	  the	  CCAFS	  data	  manager.	  	  
Most	  of	  the	  field	  data	  are	  now	  collected	  using	  a	  hand	  held	  PDA,	  which	  facilitates	  
easy	   backup	   and	   consolidation.	  We	   are	   beginning	   to	   use	   a	   CIAT	   Dataverse,	   to	  
facilitate	  data	  sharing	  associated	  with	  research	  publications.	  Effort	  is	  ongoing	  to	  
share	  trial	  data	  from	  commodity	  programs	  via	  Agtrials.	  Given	  the	  very	  large	  size	  
of	   bioinformatics	   and	   genomic	   data	   sets,	  further	   discussion	   is	   ongoing	   to	   see	  
how/	  and	  if	  we	  share	  terra	  bites	  of	  raw	  data	  and	  mechanisms	  for	  doing	  that	   in	  
terms	  of	  server	  capacity	  etc.	  	  Quite	  a	  lot	  of	  data	  is	  already	  made	  available	  through	  
GCP	  https://www.integratedbreeding.net/.	  	  
International	  Crops	  Research	  Institute	  for	  the	  Semi-­‐Arid	  Tropics	  (ICRISAT)	  
Recognizing	   the	   value	   of	   research	   data	   aggregation,	   analysis	   and	   availability,	  
ICRISAT	   established	   a	   Data	   Management	   Unit	   in	   August	   2010	   to	   mainstream,	  
support	   and	  manage	   research	   data	   for	   its	   preservation	   and	   publication	   across	  
the	  institute	  and	  to	  the	  extent	  possible,	  to	  make	  the	  data	  publicly	  available	  for	  all	  
users	  to	  use.	  The	  unit	  consists	  of	  a	  full	  time	  Senior	  Data	  Manager	  supported	  by	  
consultants,	   research	   fellows	  and	  Biometric	  unit	   of	   ICRISAT.	   	  Biometric	  unit	   at	  
ICRISAT	  work	  on	  data	  quality	  activities	  with	  a	  full	  time	  biometrician	  along	  with	  
five	  statisticians	  and	  research	  scholars.	  	  
The	  DMU	  of	  ICRISAT	  have	  come	  up	  with	  various	  data	  management	  platforms	  for	  
better	   pedigree	   management,	   breeding	   practice	   analysis,	   survey	   management,	  
climate	  prediction	  activities,	  etc.	  ICRISAT	  has	  also	  introduced	  several	  innovative	  
platforms	   to	   its	   scientists	   that	   includes	   open	   as	   well	   as	   commercial	   such	   as	  
Agrobase,	  aWhere	  (cloud	  based	  data	  management	  tool),	  VDSA	  (data	  warehouse),	  
IBP	  etc.	  and	  also	  developed	  several	  applications	  that	  helps	  better	  visualization	  &	  
PAGE	  -­‐	  10	  -­‐	  
sharing	  of	  research	  data	  with	  GIS	  analysis	  capabilities.	  The	  DMU	  also	  defined	  and	  
developed	   workflows	   and	   protocols	   for	   managing	   research	   data	   that	   being	  
produced	  by	  various	  research	  programs.	  Training	  programs	  and	  workshops	  have	  
been	   organized	   to	   the	   scientists	   and	   partners	   working	   on	   ICRISAT’s	   research	  
programs	  on	  use	  of	  new	  tools	  and	  data	  management	  platforms.	  ICRISAT is	  using	  
Dataverse	  as	  a	  platform to publish datasets. Nearly about 400 datasets in 5 formats in 
42 study areas have been uploaded in to ICRISAT-Dataverse application. ICRISAT	  
has	  adopted	  Open	  Access	  Policy	  and	  launched	  repository	  in	  May	  2011	  to	  provide	  
an	  easy	  interface	  for	  researchers,	  practitioners,	  or	  web-­‐connected	  farmers	  to	  use,	  
build	  on	  and	  share	  research	  conducted	  at	  ICRISAT. 
DMU is working with a strategy to manage the research data	   that	  being	  produced	  
across	  the	  institute,	  covering	  all	  the	  locations	  and	  thematic/program	  areas;	  and	  
organize	  it	  efficiently	  in	  a	  “central	  data	  repository”	  for	  allowing	  Global	  scientific	  
community	  “access	  to	  the	  data	  to	  get	  successful	  results”	  for	  addressing	  pressing	  
Global	  issues.	  	  	  Efforts	  have	  also	  been	  made	  to	  bring	  a	  cultural	  change	  at	  scientist	  
level	  by	  continuous	  interactions	  and	  by	  educating	  them	  on	  the	  advantages	  of	  the	  
data	  sharing.	  This	  is	  a	  win-­‐win	  situation	  for	  the	  institutes	  as	  well	  as	  the	  scientific	  
community.	  For	  this,	  ICRISAT	  has	  come	  up	  with	  Data	  Management	  Policy	  and	  has	  
been	  put	  forward	  for	  the	  research	  committee	  approvals.	  
International	  Food	  Policy	  Research	  Institute	  (IFPRI)	  
IFPRI’s	  Communication	  and	  Knowledge	  Management	  Division,	  has	  a	  Knowledge	  
Management	  Unit	  considered	  a	  support	  unit	  for	  research.	  	  The	  KM	  unit	  provides	  
support	   for	   preparing	   the	   data	   documentation,	   data	   curation,	   creates	   the	  
connection	   with	   associated	   publications	   and	   tools,	   monitors	   it	   uses	   and	   also	  
serves	  as	  a	  user	  support	  and	  Q&A	  center.	  	  	  At	  present	  the	  unit	  has	  1	  full	  time	  data	  
curator	  (with	  research	  background),	  a	  half	  time	  knowledge	  manager,	  that	  helps	  
with	  the	  taxonomies	  and	  metadata	  and	  citation	  statistics.	  
The	  Communications	  and	  Knowledge	  Management	  Division	  is	  pursuing	  to	  hire	  a	  
Data	  Manager,	  who	  will	  provide	  support	  during	  the	  research	  cycle,	  development	  
of	   data	   plans	   when	   the	   project	   is	   conceived,	   conceptual	   frameworks,	   models,	  
questionnaires	  and	  analysis.	  	  	  	  
IFPRI	  has	  a	  Dataset	  Policy	  existing	  since	  2000,	  was	  updated	   in	  2010,	  making	   it	  
mandatory	   for	   researchers	   to	   provide	   open	   access	   to	   the	   relevant	   data,	   while	  
safeguarding	   the	   privacy	   of	   participants	   and	   protecting	   confidential	   and	  
proprietary	  information.	  	  IFPRI	  will	  make	  all	  primary	  and	  value-­‐added	  secondary	  
datasets	  collected	  after	  January	  1,	  1999,	  publicly	  available	  two	  (2)	  years	  after	  all	  
data	  collection	  ceases	  or,	  before	  two	  years,	  at	  the	  time	  of	  a	  major	  publication	  by	  
the	  lead	  data	  collector.	   	  The	  datasets	  are	  released	  after	  the	  lead	  researcher	  and	  
Division	  Director	  has	  approved	  it.	  
In	  2012,	  11	  datasets	  were	  prepared	  for	  posting	  in	  the	  open	  repository	  at	  IFPRI	  
Dataverse,	   and	  now	  we	  have	   a	   total	   of	   97	  datasets.	   	   These	   datasets	   have	   been	  
cited	  120	  times	  (according	  to	  ISI	  data	  citation	  index)	  and	  have	  been	  downloaded	  
19,289	  times.	  Whenever	  possible	  we	  linked	  the	  publication	  to	  dataset	  and	  vice-­‐
versa	   so	   that	   the	   discoverability	   of	   dataset	   and	   related	   publications	   becomes	  
easier.	   Preparing	   a	   dataset	   for	   the	   open	   repository,	   takes	   between	   2.5-­‐6	   days	  
depending	  on	  how	  the	  raw	  data	  is	  provided	  to	  the	  unit	  by	  the	  research	  leader.	  	  	  
PAGE	  -­‐	  11	  -­‐	  
International	  Institute	  of	  Tropical	  Agriculture	  (IITA)	  
An	  in-­‐house	  survey	  in	  early	  2013	  revealed	  that	  most	  research	  data	  is	  stored	  on	  
individual	   scientists	   and	   units	   computers	   in	   various	   different	   software	   and	  
statistical	   packages,	   including	   Excel,	   SPSS,	   Stata,	   Access,	   and	   SQL.	   Data	   on	  
research	   projects/outputs	   is	   collected	   at	   an	   ad	   hoc	   basis	   (through	  
questionnaires,	  emails,	   letters,	   interviews,	  group	  discussions,	  reviews	  of	  official	  
reports).	  The	  data	  is	  produced	  and	  used	  in	  a	  variety	  of	  formats,	  including	  digital,	  
print	   and	   physical.	   Most	   of	   research	   data	   are	   still	   very	   fragmented.	   IITA	  
Biometrics	   Unit	   conducts	   regular	   training	   and	   backstopping	   on	   data	   collection	  
and	  analysis	  and	  supports	  genome	  sequencing	  projects,	  specifically	  in	  generating	  
the	   NGS	   (next	   generation	   sequencing)	   data	   for	   gene-­‐phenotype	   association	  
studies.	  IITA’s	  GIS	  unit	  manages	  a	  geospatial	  database	  and	  related	  products.	  	  The	  
survey	   also	   revealed	   that	   researchers	  were	   highly	   supportive	   of	   having	   a	   data	  
and	  information	  framework	  as	  they	  felt	  that	  it	  would	  reduce	  duplication	  of	  data	  
collection	  and	  survey	  fatigue	  among	  NARS	  and	  other	  partners.	  
IITA	  has	  taken	  steps	  to	  improve	  data	  management.	  One	  of	  the	  first	  successes	  is	  a	  
relational	   database	   on	   Cassava,	   which	   provides	   cassava	   breeders	   and	  
researchers	  access	   to	  data	  and	   tools	   in	  a	   centralized,	  user-­‐friendly	  and	  reliable	  
database.	   IITA	   is	   looking	   at	   developing	   other	   crop	   data	   databases	   using	   the	  
Cassavabase	  as	  a	  model.	  The	  formulation	  of	  the	  functional	  requirements	  for	  this	  
comprehensive	   Crop	   Breeding	   Data	   Management	   Platform	   based	   on	   breeders’	  
needs	   is	   underway.	   This	   platform	   would	   include	   the	   management	   of	   crop	  
information	  and	  the	  development	  of	  applications	  to	  facilitate	  breeding	  processes	  
and	   agronomic	   field	   trials	   data,	   soil	   science	   data,	   plant	   health	   information	  
(pathology	  and	  entomology)	  and	  nutrition/post-­‐harvest	  characteristics.	  
IITA	   is	   also	   envisioning	   implementing	   an	   E-­‐Research	   infrastructure	   that	   will	  
support	   research	  and	   that	   enables	   researchers	   to	  undertake	  excellent	   research	  
and	   deliver	   innovation	   outcomes,	   provide	   the	   means	   to	   manipulate,	   manage,	  
share,	   integrate	   and	   reuse	   research	   data,	   and	   enables	   research	   teams	   to	   share	  
resources	  and	  work	  together	  more	  effectively.	  	  
International	  Maize	  and	  Wheat	  Improvement	  Center	  (CIMMYT)	  
CIMMYT	   is	  dealing	  with	  a	  broad	  variety	  of	  data	  ranging	   from	  maize	  and	  wheat	  
germplasm,	   crop	   data	   at	   the	   field,	   farm,	   community,	   country,	   to	   regional	   and	  
global	   level,	   socioeconomic	   data.	   Data	   are	   handled	   by	   several	   units	   within	   or	  
across	  the	  programs,	  institution	  and	  partners.	  CIMMYT	  has	  been	  a	  key	  developer	  
of	   germplasm	   related	   data	   management	   platforms	   including	   Fieldbook,	  
IMIS/IWIS,	  and	  the	  IBP.	  Data	  volumes	  have	  increased	  rapidly	  and	  with	  that	  the	  
need	   to	   develop	   tools	   that	   allow	   the	   manipulation	   and	   use	   of	   such	   data	   (eg	  
genomics	   applications;	   decision	   support	   tools	   for	   sustainable	   intensification	  
approaches).	  	  In	  2012	  a	  new	  data	  management	  policy	  that	  came	  into	  effect	  that	  is	  
to	   accelerate	  data	   interchange	   internally	   and	   externally.	   	   Six	   data	   coordinators	  
were	  hired	  to	  assist	   in:	  a)	  establishing	  data	  standards,	  documentation	  and	  data	  
curation	   processes,	   b)	   coordinate	   receipt,	   storage,	   manipulation	   and	   quality	  
control	  of	  field	  and	  germplasm	  related,	  c)	  participate	  in	  the	  design,	  development	  
and	   population	   of	   versatile	   institutional	   databases/repositories,	   interfaces	   and	  
output	   tools,	   d)	   introduce	   new	   informatics	   tools	   to	   staff	   and	   collaborators,	  
provide	   on-­‐the-­‐job	   training,	   and	   report	   back	   user	   requirements	   to	   CIMMYT	  
software	  engineers,	  e)	  manage	  and	  document	  the	  dissemination	  of	  data	  (raw	  and	  
analyzed).	  
PAGE	  -­‐	  12	  -­‐	  
There	   are	   currently	   three	  data	   coordinators	   in	   the	  Genetic	  Resources	  Program	  
(GRP)	   who	   work	   for	   the	   Global	   Maize	   Program	   (GMP)	   and	   the	   Global	   Wheat	  
Program	   (GWP)	   handling	   both	  molecular	   data	   and	   breeder’s	   trial	   and	   nursery	  
data	   (phenotypic).	   The	   breeder’s	   nursery	   and	   trial	   data	   mainly	   consists	   of	  
phenotypic	   data	   collected	   in	   the	   field	   during	   the	   crop	   cycle	   and	   can	   be	  
complemented	  by	  quality	  and	  nutritional	  traits	  (such	  as	  nutritional	  value,	  forage	  
traits,	   baking	   quality	   or	   other	   food	   processing	   relevant	   traits)	   or	   molecular	  
information	   at	   variable	   density	   (few	   markers)	   to	   several	   genotype-­‐by-­‐
sequencing	   information	   across	   significant	   parts	   of	   the	   genome.	   The	   socio	  
economics	   program	   (SEP)	   and	   the	   global	   conservation	   agriculture	   program	  
(GCAP)	   share	   three	   data	   coordinators	   who	   work	   in	   the	   regions	   (Africa,	   Asia,	  
Latin	  America	  as	  well	   as	  globally)	  on	  both	  socio	  economic	  data	  and	  agronomic	  
trial	  data,	  their	  standardizations	  and	  platforms.	  
The	  GIS	  unit	  collects	  spatial	  and	  meteorological	  data	  on	  global,	  national	  and	  sub	  
national	  scales	  as	  well	  as	  statistical	  data	  related	  to	  maize	  and	  wheat	  production	  
in	   all	   countries	   producing	   these	   two	   commodities	   which	   is	   then	   converted	   to	  
spatial	   data	   and	   as	   geo	   referenced	   data	   collected	   and	   generated	   in	   specific	  
projects.	  	  	  	  	  	  	  
CIMMYT	  knowledge	  management	  is	  co-­‐chairing	  the	  Wheat	  Data	  Interoperability	  
working	  group	  which	  aims	  to	  coordinate	  worldwide	  research	  efforts	  in	  the	  fields	  
of	  wheat	  genetics,	  genomics,	  physiology,	  breeding	  and	  agronomy.	  This	  part	  of	  the	  
Research	  Data	  Alliance	  initiative.	  
International	  Rice	  Research	  Institute	  (IRRI)	  
IRRI	   has	   various	   research	   support	   groups	   with	   different	   roles	   and	  
responsibilities.	   The	   Research	   Data	   Management	   (RDM)	   group	  
conducts	  regular	  training	  on	  good	  practices	  in	  managing	  research	  data	  to	  ensure	  
that	   data	   are	   well	   documented	   (metadata)	   and	   organized	   systematically	  
(file	  repositories).	  	  Areas	  covered	  include:	  Research	  data	  planning;	  Research	  data	  
collection,	   authentication	   and	   storage;	   Data	   backup	   and	   security;	   and	   Data	  
archival	  and	  sharing.	  RDM	  also	  assist	  research	  staff	  in	  implementing	  these	  good	  
practices	   based	   on	   RDM	   policy	   and	   IP	   Policy.	   The	   Biometrics	   and	   Breeding	  
Informatics	   group	   conducts	  regular	  training	   on	   experimental	   designs	   and	  
statistical	  data	  analysis	  as	  well	  as	  the	  use	  of	  plant	  breeding	  tools	  developed	  in-­‐
house.	   The	   Bioinformatics	   group	   conducts	  training	   on	   Bioinformatics	   and	  
supports	  research	  groups	  by	  providing	  tools	  for	  SNP	  analysis.	  
All	   instruments	   and	   lab	   equipment	   are	   ISO	   certified	   and	   in-­‐house	   verification	  
performed	   regularly,	   to	   ensure	   accuracy	   and	   traceability	   of	   research	   data	  
collected.	   Additional	   flagship	   databases	   that	   are	   maintained	   by	   research	   units	  
are	   ICIS-­‐IRIS	   for	   germplasm	   data,	  World	   Rice	   Statistics	   and	   Household	   survey	  
database	   for	  Social	  Science	  unit,	  Long	  Term	  Continuous	  Cropping	  Experiments,	  
and	  Climate	  data	  for	  Climate	  Unit.	  	  IRRI	  also	  maintains	  the	  Rice	  Knowledge	  bank,	  
which	   is	   the	   world’s	   leading	   repository	   of	   extension	   and	   training	   materials	  
related	  to	  rice	  production.	  	  	  
International	  Livestock	  Research	  Institute	  (ILRI)	  
The	   ILRI	   Research	   Methods	   Group	   consists	   of:	   2	   statisticians,	   5	   systems	  
designers	   /	  managers,	   2	   systems	   administrators	   (high	   performance	   computing	  
systems)	  and	  2	  GIS	  data	  managers	  /	  analysts.	  In	  addition	  we	  have	  a	  large	  number	  
PAGE	  -­‐	  13	  -­‐	  
of	  GIS	  analysts	  sat	  in	  research	  programs	  and	  a	  few	  database	  managers	  and	  data	  
systems	   type	   people.	   There	   is	   also	   a	   bioinformatics	   group	   which	   sits	   in	   the	  
Directorate	  of	  Biosciences	  (BECA	  &	  Animal	  Biosciences).	  Management	  of	  data	  is	  
mainly	   at	   the	   project	   level	   although	   for	   large	   projects	   shared	   SQL	   servers	   are	  
used.	  With	  the	  advent	  of	  the	  CGIAR	  Research	  Programs	  (CRPs)	  ILRI	  is	  currently	  
establishing	  a	  data	  portal	  /	  platform	  for	  both	  Institute	  and	  the	  CRP	  on	  Livestock	  
and	   Fish	   data	   publishing.	   Initially	   the	   data	   portal	  will	   provide	   cataloguing	   and	  
access	  to	  data	  and	  in	  later	  development	  a	  module	  for	  monitoring	  and	  evaluation	  
will	   allow	   for	  meta-­‐analysis	   across	  projects,	   for	  key	   indicators	   (e.g.	   IDO’s).	  The	  
open-­‐source	   platform	   (CKAN)	   will	   also	   allow	   inter-­‐operability	   and	  
communication	  with	  ILRI’s	  knowledge	  management	  system	  (DSpace	  –	  Mahider)	  
and	   other	   platforms.	   A	   revised	   data	   management	   policy	   aligned	   to	   the	   CGIAR	  
Open-­‐access	  policy,	  the	  CGIAR	  Management	  of	  Intellectual	  Assets	  (IA)	  policy	  and	  
various	  CRP	  policies	  in	  development	  is	  currently	  being	  finalized	  with	  detailed	  a	  
detailed	   implementation	  plan	  providing	   ILRI	  staff	  and	  partners	  will	  options	   for	  
collecting,	  managing	  and	  sharing	  their	  data.	  
International	  Water	  Management	  Institute	  (IWMI)	  
IWMI’s	   Information	   and	   Knowledge	   Group	   (IKG)	   provides	   knowledge	  
management	   support,	   connecting	   IWMI’s	   knowledge	   outputs	   through	   the	  
integrated	   Institution-­‐wide	   search	   facility,	   “Poodle”	   that	   allows	   for	   a	  
comprehensive	   search	   across	   peer	   reviewed	   and	   non-­‐peer	   reviewed	   IWMI	  
research	  outputs.	  Amongst	  many	  other	  responsibilities	  IKG	  provides	  publication	  
support	   and	  manages	   IWMI’s	   internal	   publications	   such	   as	   the	   IWMI	   working	  
paper	   and	   IWMI	   research	   report	   series.	   IWMI	   currently	   has	   no	   separate	   Data	  
Management	   Unit,	   but	   its	   Research	   data	  Management	   Policy	   stipulates	   that	   all	  
data	   must	   be	   archived	   in	   IWMI’s	   central	   repository	   (WDP)	   with	   standard	  
metadata	   as	   early	   as	   possible	   after	   collection	   and	   processing	  with	   appropriate	  
access	  rights.	  Project	   leaders	  are	  held	  responsible	   for	  ensuring	   implementation	  
of	  data	  management	  policy	  at	  project	  level	  while	  IWMI	  theme	  leaders	  ensure	  that	  
all	   projects	   under	   the	   themes	   are	   complying	   with	   the	   policy.	   IWMI’s	  
Implementation	   Framework	   for	   research	   data	   management	   follows	   that	  
developed	  and	  used	  by	  ICRAF	  (2008)	  (IWMI	  2011).	  
IWMI	   has	   a	   dedicated	  Geo-­‐informatics	  Unit	   (GRandD)	  which	   is	   responsible	   for	  
developing	   data	   standards,	   protocol	   and	   training	   to	   researchers	   and	   research	  
assistants	   to	   use	   these	   standards.	   The	   unit	   provides	   research	   support	   for	  
database	  design,	  data	  organization,	  data	  manipulation,	  metadata	  preparation	  and	  
data	   archiving.	   The	   GRandD	   unit	   coordinates	  with	   projects	   for	   uploading	   data	  
into	  the	  central	  repository,	  making	  them	  available	  to	  appropriate	  users	  through	  
its	  Water	   Data	   Portal,	   an	   integrated	   portal	   for	   consulting	   and	   accessing	   IWMI	  
research	  data.	  
World	  Agroforestry	  Centre	  (ICRAF)	  
ICRAF	  has	  a	  Research	  Methods	  Group	  as	  part	  of	   their	  global	  support	  units.	  The	  
group	   provides	   services	   all	   along	   the	   research	   cycle	   from	   development	   of	  
conceptual	  frameworks	  consisting	  of	  problem	  definition,	  hypothesis,	  models	  and	  
research	   questions,	   well-­‐documented	   research	   designs	   and	   methods,	   data	  
management	  and	  curation	  and	  statistical	  analysis.	  At	  present	  the	  group	  consists	  
of	  three	  statisticians	  and	  three	  data	  base	  managers	  at	  Headquarter.	  	  The	  group	  is	  
PAGE	  -­‐	  14	  -­‐	  
further	  supported	  through	  3	  regional	  data	  managers,	  with	  further	  recruitments	  
in	  the	  regions	  line	  up	  for	  2014.	  	  
In	   2012	   about	   65%	  of	   the	  work	  was	   related	   to	   data	  management	   and	   35%	   to	  
research	  design	  and	  data	  analysis.	  	  
Since	  November	  2012	  ICRAF	  has	  a	  research	  data	  management	  policy	  that	  makes	  
it	  mandatory	   for	   scientists	   to	  provide	  open	  access	   to	   all	   relevant	  primary	  data	  
that	   are	   accompanying	   their	   scientific	   publication.	   Projects	   are	   responsible	   to	  
ensure	   that	   research	   data	   is	   described	   by	   appropriated	   Metadata	   throughout	  
their	   lifecycle	   and	   are	   required	   to	   have	   all	   their	   datasets	   submitted	   to	   the	  
repository	  upon	  closure.	  	  Metadata	  are	  move	  to	  the	  open	  domain	  as	  soon	  as	  they	  
have	   been	   compiled,	   the	   actual	   data	   sets	   are	   only	   released	   once	   the	   scientists	  
give	  their	  approval.	  
ICRAF	  is	  using	  Dataverse	  as	  its	  data	  repository,	  which	  was	  released	  on	  October	  
2011	  2011.	   	   As	   of	   September	  2013	  we	  have	  96	   studies	   in	   our	  Dataverse.	   	   The	  
usage	   is	   still	   largely	   internal,	   especially	   for	   ongoing	   projects.	   From	   our	  
experience	   in	   the	   last	   20	   month	   we	   find	   that	   projects	   are	   more	   likely	   to	  
implement	  the	  policy	  if	  they	  have	  access	  to	  a	  data	  manager	  in	  the	  same	  office.	  To	  
share	  datasets	  and	  metadata	  with	  partner	  organization	  ICRAFs	  prime	  platform	  is	  
its	   GeoPortal.	   It	   is	   primarily	   designed	   to	   provide	   researchers	  with	   secure	   data	  
storage,	  sharing	  and	  visualization	  options	  through	  its	  Web	  Mapping	  Application.	  
Ultimately,	  the	  GeoPortal	  will	  be	  a	  full-­‐fledged	  online	  GIS	  tool	  with	  a	  number	  of	  
features	   for	   visualization,	   data	   management	   and	   spatial	   modeling.	   Being	  
designed	  using	  exclusively	  open	  source	  platforms	  and	  tools	  it	  is	  being	  used	  by	  an	  
increasing	   number	   of	   CGIAR	   scientists,	   as	   well	   as	   scientists	   from	   partnering	  
institutions.	  
WorldFish	  
At	   WorldFish,	   the	   Research	   Data	   Management	   Project	   (RDMP),	   together	   with	  
three	   research	   support	   people,	   is	   adding	   value	   to	   data	   collected	   by	   WF	   staff	  
across	  all	  our	  offices.	  The	  goal	  of	   the	   initiative	   is	   to	  make	  all	  data	  produced	  by	  
Worldfish	  	  available	  firstly	  within	  Worldfish	  and	  perhaps	  more	  widely	  given	  the	  
necessary	   permissions.	   Project	   Leaders	   are	   required	   to	   submit	   well	   organized	  
and	  well-­‐described	  data	   files	  after	   their	  projects	   terminate.	   	  The	  RDMP	  team	  is	  
trying	  to	  collate	  these	  individual	  data	  sets	  into	  a	  relational	  database	  that	  allows	  
global	   analysis.	  WFA	   also	   just	   set	   up	   a	   GIS	   and	   data	   helpdesk	   for	   staff	   to	   use.	  
‘Clients’	   can	   request	   specific	   data,	   graphs,	   maps	   and	   other	   data	   products.	  
Additionally	  the	  team	  also	  manages	  ReefBase	  and	  The	  Coral	  Triangle	  Atlas	  which	  
incorporate	  online	  geo-­‐referenced	  data.	  
Success	  stories	  
Some	  example	  success	  stories	  of	  CGIAR	  open	  data:	  
AgTrials	  
AgTrials:	   http://www.agtrials.org/	   is	   an	   information	   portal	   developed	   by	   the	  
CGIAR	   Research	   Program	   on	   Climate	   Change,	   Agriculture	   and	   Food	   Security	  
(CCAFS)	  which	  provides	  access	  to	  a	  database	  on	  the	  performance	  of	  agricultural	  
technologies	   at	   sites	   across	   the	   developing	   world.	   It	   builds	   on	   decades	   of	  
evaluation	  trials,	  mostly	  of	  varieties,	  but	  includes	  any	  agricultural	  technology	  for	  
PAGE	  -­‐	  15	  -­‐	  
developing	  world	  farmers.	  This	  project	  will	  standardize	  data	  and	  information	  to	  
the	   benefit	   of	   climate	   change	   analyses,	   future	   multi-­‐environment	   trials	   and	  
research	   and	   development	   in	   international	   agriculture.	  With	   the	   interface	   you	  
can	   share	   data	   and	   information	   on	   evaluations	   of	   agricultural	   technology;	  
acquire	   agricultural	   evaluation	   data	   sets	   for	   your	   own	   research;	   explore	   the	  
geographic	  dimensions	  of	  agricultural	  evaluation.	  
ASTI	  
ASTI:	   Agricultural	   Science	   and	   Technology	   Indicators	   www.asti.cgiar.org	   is	   a	  
database	   that	   provides	   up-­‐to-­‐date	   do	   quantitative	   and	   qualitative	   data	   on	  
investment,	   capacity,	   and	   institutional	   trends	   in	   agricultural	   research	   and	  
development.	  The	  dataset	  are	  collected	  and	  updated	  annually.	  ASTI	  data	  are	  used	  
in	  informing	  policy	  formulation	  and	  decision	  making	  in	  many	  countries	  of	  Africa	  
and	  South	  Asia.	  	  
Ethiopia	  Rural	  Household	  Surveys	  
Ethiopia	  Rural	  Household	  Surveys	  (Hoddinott	  and	  Yohannes	  (2011).	  This	  dataset	  
has	  resulted	  in	  many	  publications	  since	  its	  release	  in	  2011.	  This	  dataset	  has	  been	  
cited	  at	  least	  10	  times.	   	  Furthermore,	  the	  publications	  produced	  with	  the	  use	  of	  
this	   dataset	   have	   also	   informed	   policy	  making	   in	   Ethiopia.	  We	   receive	   a	   lot	   of	  
personal	  request	  to	  provide	  this	  dataset	  as	  well	  in	  addition	  to	  web	  downloads.	  
Chronic	  Poverty	  and	  Long	  Term	  Impact	  Study	  in	  Bangladesh	  
Chronic	  Poverty	   and	  Long	  Term	   Impact	   Study	   in	  Bangladesh	   (Quisumbing	   and	  
Baulch,	   2010):	   According	   to	   Thomson	   Reuter’s	   Data	   Citation	   Index,	   this	  
particular	  dataset	   has	  been	   cited	  more	   than	  24	   times	   since	   its	   release	   in	  2010	  
and	  downloaded	  3479	  times.	  
Land	  Degradation	  Surveillance	  Framework	  
Land	  Degradation	  Surveillance	  Framework	  (LDSF),	  
http://gsl.worldagroforestry.org/?q=node/239,	  is	  a	  sampling	  protocol	  designed	  
around	  a	  spatially	  stratified,	  randomized	  sampling	  design,	  to	  provide	  a	  
biophysical	  baseline	  at	  landscape	  level.	  The	  LDSF	  was	  developed	  at	  the	  World	  
Agroforestry	  Centre	  for	  landscape	  level	  assessments	  and	  studies	  of	  carbon	  
dynamics,	  vegetation	  changes,	  soil	  functional	  properties	  and	  soil	  hydrological	  
properties.	  The	  LDSF	  has	  been	  implemented	  in	  more	  than	  20	  countries	  in	  Africa	  
to	  date,	  including	  the	  CIAT-­‐led	  Africa	  Soil	  Information	  Service	  (AfSIS)	  project.	  
The	  methodology	  has	  been	  shown	  to	  be	  appropriate	  for	  studies	  of	  land	  health	  
and	  land	  degradation	  risk,	  as	  well	  as	  for	  assessing	  soil	  organic	  carbon	  dynamics	  
in	  rangeland	  systems.	  
Poverty	  Environmental	  Network	  
Poverty	  Environmental	  Network	  PEN.	  Launched	  in	  2004,	  PEN,	  
http://www.cifor.org/pen,	  is	  the	  largest	  and	  most	  comprehensive	  global	  analysis	  
of	  tropical	  forests	  and	  poverty.	  Its	  database	  contains	  survey	  data	  on	  8000+	  
households	  in	  40+	  study	  sites	  in	  25	  developing	  countries.	  At	  the	  core	  of	  PEN	  is	  
comparative,	  detailed	  socio-­‐economic	  data	  that	  was	  collected	  quarterly	  at	  the	  
household	  and	  village	  level	  by	  50+	  research	  partners	  using	  standardized	  
definitions,	  questionnaires	  and	  methods.	  
PAGE	  -­‐	  16	  -­‐	  
Reefbase,	  and	  the	  CT-­‐Atlas	  
Reefbase,	  and	  the	  CT-­‐Atlas	  (http://www.reefbase.org/main.aspx;	  
http://ctatlas.reefbase.org/)	  are	  online	  GIS	  database	  systems	  developed	  by	  
WorldFish	  and	  partners.	  This	  year	  they	  have	  been	  recognized	  by	  the	  Thematic	  
Working	  Groups	  of	  the	  Coral	  Triangle	  Initiative	  as	  their	  official	  data	  storage	  and	  
retrieval	  tools.	  	  These	  databases	  are	  improving	  the	  regional	  coordination	  of	  
conservation	  and	  management	  activities	  in	  the	  Coral	  Triangle	  region.	  The	  CT-­‐
Atlas	  will	  be	  maintained	  in	  future	  with	  funding	  from	  CCAFS,	  ADB	  and	  the	  IAEA	  
(International	  Atomic	  Energy	  Agency)	  and	  will	  start	  to	  include	  more	  socio-­‐
economic	  datasets,	  and	  be	  directed	  more	  towards	  the	  examination	  of	  food	  
security	  issues.	  	  
Longitudinal	  Village	  Level	  Studies	  
The	   longitudinal	   Village	   Level	   Studies	   (VDSA),	   	   http://vdsa.icrisat.ac.in/vdsa-­‐
vls.htm,	  	  of	  ICRISAT	  have	  for	  over	  three	  decades	  provided	  profound	  insights	  into	  
the	  social	  and	  economic	  changes	   in	  the	  village	  and	  household	  economies	   in	  the	  
semi-­‐arid	  tropics	  of	  Asia	  and	  Africa.	  Over	  150	  research	  papers	  and	  more	  than	  40	  
doctoral	  dissertations	  have	  been	  based	  on	  empirical	  analysis	  of	  VLS	  data	  in	  the	  
semi-­‐arid	   tropics	   of	   India	   and	  West	   Africa.	   A	   recent	   search	   in	   Google	   scholar	  
shows	  that	  this	  body	  of	  work	  has	  generated	  over	  10,000	  citations.	  
Cassavabase	  
Cassavabase:	   The	   Next	   Generation	   Cassava	   Breeding	   (NEXTGEN	   Cassava)	  
project,	   implemented	   in	   collaboration	  with	  Cornell	  University,	  has	  developed	  a	  
database	   which	   gives	   The	   database	   (www.cassavabase.org)	   contains	   Genomic	  
Selection	  algorithms	  and	  analysis	   capacity,	   a	   cassava	  genome	  browser,	   cassava	  
ontology	  tools,	  phenotyping	  tools,	  and	  social	  networking.	  Tools	  are	  developed	  on	  
Cassavabase	   that	   improve	   partner	   breeding	   program	   information	   tracking,	  
streamline	  management	  of	  genotypic	  and	  phenotypic	  data,	  and	  pipeline	  that	  data	  
through	  Genomic	  Selection	  prediction	  analyses.	  By	  the	  project	  end,	  Cassavabase	  
will	  be	  fully	  hosted	  at	  IITA,	  providing	  a	  "one-­‐stop	  shop"	  for	  cassava	  researchers	  
and	  breeders	  worldwide.	  
CIAT	  Geonetwork	  
CIAT	  Geonetwork	  
(http://gisweb.ciat.cgiar.org:8080/geonetwork/srv/en/about),	  the	  SRTM	  digital	  
elevation	  data	  (http://srtm.csi.cgiar.org/)	  and	  Worldclim	  
(http://www.worldclim.org)	  are	  great	  examples.	  Some	  of	  these	  datasets	  and	  
associated	  information	  has	  been	  downloaded	  by	  several	  thousand	  users.	  The	  
research	  paper	  published	  in	  2005	  (CIAT	  and	  Bioversity)	  by	  Hijmans	  et	  al	  (2005)	  
very	  high	  resolution	  interpolated	  climate	  surfaces	  for	  global	  land	  areas	  
(Worldclim)’’	  is	  one	  of	  the	  most	  cited	  article	  in	  CGIAR	  history.	  
Intergenebank	  Potato	  Database	  	  
The	  Intergenebank	  Potato	  Database	  (IPD),	  a	  database	  to	  cross-­‐reference	  potato	  
collections	  and	  share	  passport	  and	  evaluation	  data,	  since	  early	  1995.	  	  
PAGE	  -­‐	  17	  -­‐	  
SINGER	  
(Systems-­‐Wide	   Information	   Network	   for	   Genetic	   Resources):	   is	   a	   genetic	  
resources	  information	  exchange	  network	  that	  provides	  access	  to	  information	  on	  
the	   collections	   of	   genetic	   resources.	   Established	   in	   the	   1990ies	   the	   collection	  
now	  comprise	  over	  half	  a	  million	  samples	  of	  crop,	  forage	  and	  tree	  germplasm	  of	  
major	  importance	  for	  food	  and	  agriculture.	  
Barriers	  to	  mainstreaming	  data	  sharing	  
Sensitive	  data	  &	  data	  confidentiality	  
Data	  sets	  often	  have	  personal	   information	  about	  households	  or	   informants	  that	  
they	  do	  not	  expect	  to	  be	  made	  public	  and	  that	  would	  be	  unethical	  to	  make	  public,	  
given	   the	   trust	   and	   rapport	   between	   researcher	   and	   ‘informant’.	   	   Sensitive	  
research	   data	   include	   data	   on	   illegal	   activities,	   corruption,	   land	   tenure	   and	  
conflicts,	   controversial	   governmental	   policies,	   the	   location	   of	   wild	   and	  
domesticated	   plant	   genetic	   resources	   with	   particularly	   valuable	   traits.	   Several	  
countries	  have	  some	  form	  of	  data	  protection	   laws	  to	  regulate	  the	  processing	  of	  
information	  relating	  to	  individuals	  and	  their	  traditional	  knowledge,	  including	  the	  
obtaining,	   holding,	   use	   or	   disclosure	   of	   such	   information.	   Very	   few	   CGIAR	  
Centers	  have	  ethical	   review	  boards	  or	  committees	   to	  ensure	   that	  data	  used	   for	  
human	  or	  behavioral	   research	  complies	  with	  ethical	   review	  standards	  and	   that	  
research	   subjects	   or	   participants	   are	   protected.	   While	   there	   are	   a	   number	   of	  
standard	   procedures	   to	   anonymise	   data	   sets,	   such	   as	   removing	   names,	  
addresses,	   and	   contact	   information	   or	   encryption	   of	   location	   data	   (GPS)	   it	  
creates	   a	   secondary	   problem	   of	   managing	   different	   data	   sets:	   one	   complete	  
version	  for	  internal	  use	  and	  the	  second	  autonomous	  data	  set	  for	  public	  use.	  	  
Diverse	  data	  sets	  and	  backlog	  
The	   interdisciplinary	   nature	   of	   the	   research	   conducted	   within	   each	   of	   the	  
Centers	  results	  in	  a	  vast	  variety	  of	  data	  formats	  and	  types.	  Automated	  workflows	  
for	   data	   verification,	   cleaning	   and	   aggregation	   need	   to	   be	   customized	   for	   each	  
project,	   resulting	   in	   high	   demand	   on	   research	   support	   staff	   time.	   In	   addition	  
global	   analyses	   across	   data	   sets	   from	   different	   projects	   requires	   relational	  
databases	   which	   are	   difficult	   to	   realize	   if	   data	   collections	   have	   not	   been	  
standardized.	  While	  across	  the	  Centers	  there	  are	  various	  affords	  to	  identify	  “key”	  
indicators	  that	  should	  be	  collected	  by	  each	  project	  or	  to	  introduce	  standardized	  
sampling	   designs	   and	   survey	   modules	   scientists	   feel	   that	   scientific	   creativity	  
should	  not	  be	  pressed	  into	  rigid	  protocols.	  	  In	  addition,	  each	  centre	  has	  a	  legacy	  
of	  high	  value	  data	  sets	  that	  have	  not	  been	  fully	  curated.	  To	  find	  sufficient	  budget	  
to	   update	   these	   datasets	   and	   to	   make	   them	   available	   within	   and	   outside	   the	  
centre	  is	  a	  problem.	  	  
Data	  ownership	  and	  recognition	  of	  data	  authors	  	  
Data	   cannot	   easily	   be	   protected	   against	   copying	   that	   work,	   or	   reproducing	   it	  
without	   authorization	   or	   attribution.	   Copyright	   applies	   not	   to	   the	   facts	   or	   the	  
information	   itself,	   but	   to	   the	   particular	   way	   the	   facts	   or	   information	   are	  
presented	   in	   the	   dataset	   or	   database.	   As	   such	   a	   database	   can	   be	   protected	   by	  
copyright,	   but	   only	   the	   database	  model	   and	   the	   data	   entry	   and	   output	   not	   the	  
PAGE	  -­‐	  18	  -­‐	  
actual	   numbers	   or	   names	   in	   the	   database.	   	   Data	   sharing	   websites	   such	   as	  
DataCite	  or	  Dataverse	  assign	  a	  persistent	  authorship	   identifier	   (URI),	   such	  as	  a	  
digital	   object	   identifier	   (DOI)	   or	   handle	   to	   data	   sets	   and	   have	   specific	   user	  
agreements.	  However,	  while	  these	  might	  be	  legally	  binding	  in	  some	  countries	  in	  
it	  not	  clear	  how	  scientist	  or	  centers	  can	  take	  legal	  actions	  against	  misuse.	  While	  
the	  ownership	  of	  the	  data	  and	  the	  right	  to	  reproduce	  the	  work	  usually	  belongs	  to	  
the	   centers,	   scientists	   are	   given	   authorship	   rights.	   Unlike	   scientific	   publication	  
there	  are	  no	  standard	  guidelines	  regarding	  data	  authorship.	  	  Centers	  and	  project	  
managers	  need	  to	  ensure	  that	  both	  technical	  as	  well	  as	  scientific	  staff	  are	  given	  
the	   deserved	   credit	   for	   their	   work,	   thus	   all	   people	   that	   have	   substantially	  
contributed	  to	  the	  creation	  of	  the	  datasets	  should	  be	  data	  authors.	  	  
For	   projects	   that	   consist	   of	   multiple	   datasets	   produced	   by	   different	   teams	   of	  
scientists,	  decisions	  need	  to	  be	  made	  about	  assigning	  the	  authorship.	  Assigning	  
the	   same	  authorship	   (the	   same	  persistent	   identifier)	   to	   all	   project	   related	  data	  
ensures	   the	   coherence	   of	   the	   datasets,	   but	   it	   does	   not	   allow	   differentiating	  
between	   the	   different	   contributions	   of	   scientists,	   which	   is	   problematic	   with	  
respect	   to	   accountability.	   The	   same	   issues	   arise	   for	   dynamic	   datasets	   such	   as	  
R&D	   databases	   and	   trial	   data	   that	   data.	   	   Other	   organizations	   with	   a	   stronger	  
mandate	  to	  produce	  global	  data	  sets	  as	  public	  goods	  such	  as	  the	  OECD	  and	  FAO	  
attributing	   institution/program	   as	   the	   data	   authors	   for	   dynamic	   datasets,	  
individual	  contribution	  are	  recognized	  within	   list	  of	   contributors	  or	  mentioned	  
in	  acknowledgement.	  Assigning	  authorship	   for	  value	  –added	  secondary	  dataset	  
is	  more	  complicated.	   	   If	  a	  person	  or	  an	  organization	  provided	  data,	  should	  they	  
be	   included	   as	   co-­‐author?	   Based	   on	   substantial	   contribution	   definition	   of	   data	  
authorship,	   the	   input	   (secondary	   dataset)	   is	   enough	   for	   consideration	   for	   the	  
data	   authorship.	   Several	   centers	   have	   policies	   and	   guidance	   on	   authorship	  
although	   some	   of	   the	   details	   above	   are	   not	   always	   adequately	   covered	   in	   the	  
guidance.	  
Institutional	  culture	  	  
Under	   the	   previous	   structure	   of	   CGIAR,	   the	   former	   Science	   Council	   conducted	  
annual	  evaluations	  of	  two	  performance	  indicators	  for	  CGIAR	  research:	  outcomes	  
and	   ex-­‐post	   impact.	   Output	   indicators	   strongly	   focused	   on	   a	   quantitative	  
publication	  matrix	   and	  were	  directly	   linked	   to	   allocation	  of	   funding.	  Until	   now	  
annual	  performance	  evaluation	  of	  scientists	   focus	  strongly	  on	  publication	  rates	  
and	  the	  Hirsch-­‐index2is	  used	  as	  a	  standard	  indicator	  of	  scientific	  performance	  in	  
recruitment	  procedures.	  
When	  evaluating	  research	  a	  clear	  distinction	  should	  be	  made	  between	  research	  
‘quality’	   (i.e.	   the	   relative	  excellence	  of	  academic	  outputs	   intended	   for	  academic	  
consumption,	   e.g.	   journal	   papers	   and	   books)	   and	   research	   ‘impact’	   (i.e.	   the	  
benefits	   that	   research	  outcomes	  produce	   for	  wider	   society).	  Unfortunately	   this	  
division	   is	   often	   confused,	   a	   prime	   example	   being	   when	   journal	   citation	  
(‘quality’)	  metrics	   are	   incorrectly	  presented	   as	  measures	   of	   ‘impact’	   (Donovan,	  
2011).	   	  Even	  when	   journal	   citations	  are	  used	  correctly	  as	  a	  measure	  of	  quality	  
CGIAR	  Centers	  need	  to	  be	  critical	  about	  what	  is	  to	  be	  measured.	  Publications	  are	  
	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  
2	  The	  h-­‐index	   is	   an	   index	   that	   attempts	   to	  measure	   both	   the	   productivity	   and	  
impact	  of	  the	  published	  work	  of	  a	  scientist	  or	  scholar.	  
PAGE	  -­‐	  19	  -­‐	  
usually	   reviewed	  based	  on	   their	   content,	   their	  originality	  and	   the	  way	  analysis	  
and	   interpretation	   of	   the	   data	   or	   information	   is	   presented.	   Publications	   are	  
seldom	  evaluated	  based	  on	  the	  technical	  rigor	  of	  the	  data	  collection	  procedures,	  
the	   completeness	   of	   the	   data	   and	   its	   description,	   and	   alignment	   with	   existing	  
community	   standards.	  To	   translate	   conceptual	   frameworks	   into	   empirical	  
sampling	   designs	   takes	   significant	   research	   experience.	   Thus	   producing	   a	   high	  
value	  data	  set	  that	  forms	  the	  basis	  of	  a	  high	  quality	  scientific	  publication	  requires	  
a	  high	  level	  of	  scientific	  sophistication,	  whereas	  writing	  the	  paper	  itself	  requires	  
a	   good	   grasp	   of	   language,	   some	   understanding	   of	   the	   science	   you're	   writing	  
about,	  and	  an	  ability	   to	  "translate"	   technical	   information	   into	  plain	  English	  and	  
write	   about	   it	   compellingly	   (Costandi,	   2013).	   	   An	   institutional	   culture	   that	  
simplifies	  research	  quality	  to	  counts	  of	  publications	  and	  number	  of	  technologies	  
released	  does	  not	  nurture	  the	  sharing	  of	  data,	  but	  cultivates	  protectionism	  were	  
data	  is	  viewed	  as	  the	  intellectual	  property	  of	  individual	  researchers.	  	  
Exclusion	   of	   data	   preparation	   and	   publication	   from	   the	   research	   project	  
lifecycle	  
Work	   at	   the	   centers	   is	   still	   very	   much	   project	   driven	   despite	   the	   reform	   that	  
aimed	   at	   strengthened	   and	   coordinated	   funding	   mechanisms	   linked	   to	   the	  
System	   agenda	   and	   priorities.	   As	   such	   project	   lifecycles	   are	   shaped	   by	   grant	  
requirements,	  which	  do	  not	  usually	  see	  data	  as	  part	  of	  the	  project	  deliverables.	  
While	  projects	  adequately	  account	  for	  the	  costs	  of	  conducting	  research,	  including	  
the	   collection	   and	   analysis	   of	   research	   data,	   they	   seldom	   include	   preparing	   of	  
data	  and	  metadata	  and	  curation	  as	  part	  of	  the	  costs	  of	  the	  research	  process	  led	  
alone	   long-­‐term	  costs	   for	  data	  storage	  and	  preservation	  beyond	   the	   immediate	  
lifetime	   of	   a	   project.	   Projects	   are	   usually	   considered	   closed	   when	   all	   grants	  
requirements	   have	   been	   met,	   leaving	   preparing	   of	   data	   and	   metadata	   and	  
curation	   unaccounted	   and	   unbudgeted	   for.	   Metadata	   collection	   and	   proper	  
documentation,	  especially	  of	  multi-­‐country	  data	  sets	  is	  difficult	  to	  outsource	  and	  
overstretched	  scientists	  are	  often	  unable	  to	  allocate	  the	  necessary	  time.	  	  As	  data	  
collection,	   if	   not	   specifically	   stated	   as	   project	   output,	   is	   often	   considered	   as	   a	  
means	   to	   an	   end,	   especially	   in	   projects	   with	   a	   strong	   development	   focus,	   the	  
necessary	   scientific	   scrutiny	   for	   sampling	   design	   and	   theoretical	   framework	   is	  
often	  missing,	  limiting	  the	  value	  and	  re-­‐use	  of	  the	  data.	  A	  clear	  recognition	  of	  the	  
value	   of	   data	   in	   developmental	   work	   and	   a	   clear	   mandate	   to	   make	   the	   data	  
available	  will	  ensure	  that	  projects	  will	  use	  their	  scientific	  expertise	  to	  create	  high	  
value	  data	  sets.	  	  
Disconnect	  between	  libraries	  and	  data	  management	  units	  
Research	  libraries	  are	  not	  only	  places	  to	  keep	  collections,	  but	  their	  real	  strength	  
and	   power	   lies	   in	   organizing,	   preserving,	   and	   making	   knowledge	   accessible.	  
Unfortunately,	  libraries	  and	  data	  units	  in	  the	  Centres	  are	  positioned	  at	  opposite	  
ends	  of	   the	  research	   lifecycle:	  data	  management	  units	  have	  been	  established	  to	  
help	   researchers	   collect	   and	   process	   their	   data,	   and	   libraries	   to	   support	   the	  
publications	   that	   result	   from	   research	  projects.	   Libraries	   also	  help	   arrange	   the	  
search	  for	  publications	  as	  the	  basis	  for	  new	  research.	  	  While	  both	  are	  considered	  
to	   be	   research	   support	   units,	   libraries	   are	   often	   anchored	   with	   the	  
communication	  units	  and	  data	  management	  with	   the	   research	  units;	   each	  with	  
its	   own	   directorates,	   reporting	   lines	   and	   working	   culture.	   With	   increasing	  
numbers	   of	   journals	   requesting	   authors	   to	  make	   the	   replication	   data	   available	  
PAGE	  -­‐	  20	  -­‐	  
and	   the	   rise	   of	   data	   papers	   the	   functional	   boundaries	   of	   librarian	   and	   data	  
managers	   is	   becoming	   blurred.	  With	   the	   convergence	   of	   and	   interdependency	  
between	  both	  data	  and	  publications,	  the	  distinctive,	  but	  complimentary	  skill	  sets	  
of	   both	   units	   are	   needed	   to	   safeguard	   data	   availability,	   discoverability,	  
interpretability	  and	  re-­‐useability.	  	  	  	  
	  
Relative	  benefits,	  as	  opposed	  to	  costs,	  of	  data	  publishing	  not	  clear	  
CGIAR	  has	  the	  mandate	  to	  conduct	  and	  facilitate	  research	  in	  development	  with	  a	  
clear	   target	   to	   reduce	   poverty	   and	   hunger.	   While	   there	   are	   many	   institutions	  
with	  a	  similar	  mandate,	  very	  few	  have	  a	  similar	  global	  outreach	  and	  the	  capacity	  
to	   collect	   information	   and	  data	   on	   a	   variety	   of	   food	  production	   systems	   at	   the	  
household	  or	  farm	  level.	  While	  it	  seems	  intuitively	  that	  making	  these	  information	  
and	   data	   available	   would	   lead	   to	   more	   rapid	   advances	   in	   developmental	  
questions,	   it	   is	   less	   clear	   what	   the	   impact	   pathways	   of	   data	   sets	   would	   be.	  	  
Publishing	  research	  data	  comes	  with	  an	  increase	  cost,	  both	  at	  the	  project	  level	  as	  
well	   as	   at	   the	   center	  management	   level.	   	   Both	   centers	   and	   donors	   need	   to	   be	  
assured	  of	  the	  benefits	  of	  money	  spend	  on	  data	  publishing.	   	  Rather	  than	  simply	  
assuming	   that	  open	  access	   is	  beneficial	   to	   the	  work	  of	  CGIAR	  a	   clear	   theory	  of	  
change	  with	  an	  appropriate	  indicator	  matrix	  is	  needed	  to	  allow	  for	  a	  cost	  benefit	  
analysis	   on	   the	   contribution	   of	   data	   sets	   to	   CGIAR’s	   overall	   impact	   along	   the	  
research	  in	  development	  continuum.	  	  	  
Recommendations	  
Clear	  mandate	  to	  include	  data	  as	  research	  outputs	  
The	  mandate	  to	  publish	  research	  data	  as	  research	  outputs	  needs	  to	  be	  explicitly	  
stated	  in	  the	  Strategy	  and	  Results	  Framework	  (SRF),	  which	  sets	  common	  goals,	  
strategic	  objectives	  and	  results	  to	  be	  jointly	  achieved	  by	  CGIAR	  and	  its	  partners.	  
Centers	  and	  donors	  have	  to	  recognize	  data	  sets	  as	  important	  information	  assets	  
and	  project	  deliverables.	  Only	   if	  data	   sets	  are	   included	   in	   the	  performance	  and	  
impact	   evaluation	   of	   CGIAR,	   will	   the	   centers	   be	   in	   a	   position	   to	   provide	  
incentives	  for	  scientists	  to	  allocate	  sufficient	  time	  to	  data	  quality	  at	  all	  stages	  of	  
the	   research	   cycle;	   from	   development	   of	   conceptual	   frameworks	   consisting	   of	  
problem	   definition,	   hypothesis,	   models	   and	   research	   questions,	   well-­‐
documented	   research	   designs	   and	  methods,	   data	  management	   and	   curation	   to	  
statistical	  analysis.	  	  
CGIAR	  Fund	  allocation	  to	  support	  cross-­‐center	  collaboration	  
While	   research	   data	   are	   produced	   at	   centre	   level	   and	   shaped	   by	   the	   research	  
emphasis	   and	   character	   of	   each	   centre	   there	   are	   overarching	   data	   publishing	  
issues	   relevant	   to	   all	   research	   data	   and	   all	   Centers.	   Specific	   CGIAR	   Fund	  
allocations	  need	  to	  be	  made	  available	  to	  facilitate	  cross-­‐center	  collaboration	  and	  
information	  sharing	  with	  respect	  to	  data	  management	  and	  publishing.	  Research	  
activities	   in	   CGIAR	   are	   funded	   through	   the	   CGIAR	   Fund,	   a	   new	   multi-­‐donor,	  
multi-­‐year	   funding	   mechanism.	   It	   finances	   research	   aligned	   with	   the	   Strategy	  
and	  Results	   Framework	  developed	  by	   the	   CGIAR	  Consortium	  and	   endorsed	   by	  
the	   Funders	   Forum	   to	   establish	   common	   goals,	   objectives	   and	   results	   for	   the	  
PAGE	  -­‐	  21	  -­‐	  
CGIAR	  partnership.	  The	  Fund	  is	  already	  facilitating	  cross-­‐center	  collaboration	  in	  
addressing	   key	   research	   questions	   (i.e.	   gender,	   capacity	   development	   and	  
communications)	   via	   the	   strategic	   research	   programs,	   but	   cross	   cutting	   issues	  
like	  data	  sharing	  do	  not	  have	  a	  home	  yet.	  	  
Data	  Policies	  at	  Centre	  level	  
Each	  Center	  should	  have	  a	  research	  data	  management	  policy	  that	  is	  aligned	  with	  
the	  CGIAR	  Principles	  on	  the	  Management	  of	  Intellectual	  Assets,	  the	  CGIAR	  Open-­‐
Access	   Policy	   and	   consistent	   with	   the	   aims	   of	   public	   funded	   research.	   The	  
Policies	   need	   to	   lay	   out	   the	   basic	   principles	   of	   research	   data	   management	   at	  
center	   level,	  address	  what	  data	  can	  be	  made	  available	   to	   the	  public	  when,	  how	  
and	   by	   whom.	   Roles	   and	   responsibilities	   of	   individual	   scientists,	   projects	   and	  
research	   support	   units	   towards	   this	   afford	   need	   to	   be	   stated.	   Legal	   clarity	   on	  
licenses,	   data	   ownership	   and	   authorships	   should	  be	   addressed.	   Policies	   should	  
be	   accompanied	   by	   flexible	   implementation	   guidelines	   that	   ensure	   that	   data	   is	  
used	  by	  the	  Center	  and	  its	  partners	  in	  the	  most	  efficient	  way	  while	  safeguarding	  
that	  scientists	  are	  given	  sufficient	  time	  to	  produce	  the	  scientific	  publications	  they	  
set	  out	  to	  do.	  	  
Ethical	  committee	  to	  be	  established	  in	  all	  Centers	  
Each	  Center	  should	  have	  an	  ethical	  committee	  that	  is	  responsible	  for	  developing	  
the	  centers	  research	  ethic	  guidelines	  and	  policies	  addressing	  sensitive	  data	  and	  
data	   confidentiality,	   as	  well	   as	  appropriate	  handling	  of	   research	  data.	   	  Projects	  
should	  be	  reviewed	  based	  on	  their	  adherence	  to	  the	  accepted	  ethical	  standards	  
of	  a	  genuine	  research	  study.	  	  
Clear	  guidelines	  on	  authorship	  attribution	  
To	   ensure	   that	   both	   technical	   as	   well	   as	   scientific	   staff	   is	   given	   the	   deserved	  
credit	   for	   their	   work	   all	   people	   that	   have	   substantially	   contributed	   to	   the	  
creation	   of	   the	   datasets	   should	   be	   data	   authors.	   This	   includes	   all	   people	   that	  
played	  a	  key	  role	  in	  the	  following:	  	  
1) Conceiving	   and	  designing	   the	   field	  work	   in	   response	   to	   questions	   of	  
recognized	  scientific	   importance	  and/or	  relevance	  for	  developmental	  
impact	  and	  policy	  change.	  
2) Development	   and	   implementation	   of	   research	   designs,	   choice	   of	  
methods,	  quality	  control	  on	  data	  collection.	  
3) Database	  design,	  data	  cleaning,	  validation	  and	  verification	  processes	  	  
Zero	  tolerance	  of	  scientific	  fraud	  
Each	   Centre	   needs	   to	   implement	   zero	   tolerance	   policies	   with	   respect	   to	   data	  
manipulation	   and	   should	   have	   explicit	   standards	   regarding	   the	   appropriate	  
handling	  of	  research	  data.	  	  
Adoption	  of	  OAI-­‐compliant	  data	  repositories	  across	  CGIAR	  
The	  Open	  Archives	   Initiative	   Protocol	   for	  Metadata	  Harvesting	   (OAI-­‐PMH)	   is	   a	  
low-­‐barrier	  mechanism	   for	   repository	   interoperability.	   To	   enable	   the	   broadest	  
level	  of	  interoperability,	  OAI-­‐PMH	  mandates	  that	  metadata	  should	  be	  exposed	  as	  
Dublin	  Core.	  Data	  Providers	  are	  repositories	  that	  expose	  structured	  metadata	  via	  
PAGE	  -­‐	  22	  -­‐	  
OAI-­‐PMH.	  Service	  Providers	  then	  make	  OAI-­‐PMH	  service	  requests	  to	  harvest	  that	  
metadata.	  By	  using	  OAI-­‐PMH	  compliant	  repositories	  centers	  can	  have	  their	  own	  
individual	   institutional	   repositories	   each	   with	   their	   own	   particular	   collection	  
policies	   and	   administrative	   systems,	   but	   to	   be	   linked	   into	   one	   large,	   a	   virtual,	  
global	  repository	  through	  the	  use	  of	  the	  OAI-­‐PMH.	  	  
Linking	  data	  and	  publications	  
To	   improve	   scientific	   publications,	   consensus	   with	   scientific	   peers	   and	   public	  
trust	   in	   the	   quality	   of	   our	   research	   outputs	   each	   Centre	   should	   make	   all	  
necessary	  raw	  data	  public	  to	  reproduce	  or	  replicate	  every	  scientific	  publication	  
that	  is	  based	  on	  research	  data.	  	  
Building	  libraries	  capacity	  for	  data	  curation	  	  
The	  Centers	  libraries	  need	  to	  embrace	  the	  challenges	  that	  come	  with	  publishing	  
research	   data.	   Libraries	   need	   to	   support	   data	   units	   in	   their	   efforts	   to	   publish	  
research	   data	   by	   providing	   persistent	   identification/citation	   of	   datasets,	  
guidance	  on	  authorships	  and	  solutions	  for	  data	  description,	  documentation	  and	  
retrieval,	   which	   together	   facilitate	   findability	   (Reilly,	   2012).	   They	   must	   also	  
ensure	   long-­‐term	  data	  archiving,	   including	  data	   curation	  and	  preservation	  as	  a	  
condition	  for	  data	  interpretability	  and	  re-­‐usability.	  The	  Centers	  need	  to	  invest	  in	  
developing	   the	   staff	   skills	   required	   for	   achieving	   the	   data	   curation	   role	   in	   the	  
libraries	   and	   need	   to	   of	   recruit	   library	   staff	   with	   experience	   in	   research	  
disciplines.	  
Specific	  funds	  to	  publish	  legacy	  data	  
Legacy	   data	   sets	   that	   have	   high	   potential	   to	   contribute	   towards	   achieving	   the	  
four	  system	  level	  outcomes	  should	  be	  archived	  and	  published.	  Most	  of	  these	  data	  
sets	  are	  fully	  documented,	  however	  documentation	  and	  data	  formats	  need	  to	  be	  
brought	   in	   line	  with	   today’s	   requirements	   and	   standards.	  To	  ensure	   that	   these	  
data	  sets	  can	  be	  made	  available	  to	  CGIAR	  scientists	  and	  partners,	  working	  in	  the	  
CGIAR	  Research	  Programs,	  specific	  financial	  incentives	  need	  to	  be	  provided.	  	  	  	  
Changing	  institutional	  culture	  
Performance	  evaluations	  both	  at	  individual	  scientist	  level	  as	  well	  as	  center	  level	  
need	  to	  shift	  from	  using	  simplistic	  indicators	  metrics	  such	  as	  numbers	  of	  papers,	  
positions	   in	   lists	  of	  authors,	  and	   journals’	   impact	   factors	   towards	  assessing	   the	  
quality	  of	  research	  itself.	  Centers	  and	  science	  managers	  need	  to	  put	  performance	  
indicators	   in	   place	   that	   not	   only	   reward	   the	   excellent	   scientific	   writers	   the	  
system	   has,	   but	   also	   the	   scientific	   and	   technical	   excellence	   that	   leads	   to	   the	  
creation	  of	  the	  data,	  methods	  and	  ideas	  that	  are	  supposed	  to	  be	  communicated	  in	  
the	  papers.	  Internal	  project	  reviews	  should	  take	  into	  account	  the	  technical	  rigor	  
of	   the	   data	   collection	   procedures,	   the	   completeness	   of	   the	   data	   and	   its	  
description,	   and	   alignment	   with	   existing	   community	   standards.	  	   Scientists	   and	  
their	  field	  teams	  should	  be	  encouraged	  to	  produce	  peer-­‐reviewed	  data	  papers.	  
	  
	  
	   	  
PAGE	  -­‐	  23	  -­‐	  
Key	  references	  
AfricaRice	  Dataverse	  	  http://africarice.org/warda/dataverse.asp	  	  
Alsheikh-­‐Ali	  AA,	  Qureshi	  W,	  Al-­‐Mallah	  MH,	  Ioannidis	  JPA	  (2011)	  Public	  
Availability	  of	  Published	  Research	  Data	  in	  High-­‐Impact	  Journals.	  PLoS	  ONE	  
6(9):	  e24357.	  doi:10.1371/journal.pone.0024357	  	  
Altman	  DG,	  Furberg	  CG,	  Grimshaw	  GM,	  Rothwell	  PM	  (2006).	  Lead	  editorial:	  
Trials	  –	  using	  the	  opportunities	  of	  electronic	  publishing	  to	  improve	  the	  
reporting	  of	  randomized	  trials.	  Trials,	  7:6.	  doi:	  10.1186/1745-­‐6215-­‐7-­‐6.	  
[PMC	  free	  article]	  [PubMed]	  [Cross	  Ref]	  
Anon	  (2008)	  Improving	  Research	  Data	  Management	  and	  Sharing	  in	  the	  Alliance	  
of	  CGIAR	  Centres;	  A	  Working	  Paper	  for	  Consideration	  of	  the	  CGIAR	  Alliance,	  
online	  
http://cropwiki.irri.org/gcp/images/4/40/ResearchDataManagement_CG-­‐
ADE.pdf	  	  
Bioversity	  International	  Dataverse	  (2013),	  
http://thedata.harvard.edu/dvn/dv/Bioversity	  
CIAT	  Dataverse,	  http://dvn.iq.harvard.edu/dvn/dv/CIAT	  
Costandi	  M	  (2013)	  A	  good	  story	  conveys	  wonderment,	  The	  Guardian,	  Monday	  22	  
April	  2013,	  online	  	  http://www.theguardian.com/science/2013/apr/22/mo-­‐
costandi-­‐science-­‐writing	  
Donovan	  C	  (2011)	  Impact	  is	  a	  strong	  weapon	  for	  making	  an	  evidence-­‐based	  case	  
for	  enhanced	  research	  support	  but	  a	  state-­‐of-­‐the-­‐art	  approach	  to	  
measurement	  is	  needed.	  Citations,	  REF	  2014,	  Research	  funding,	  online	  
http://blogs.lse.ac.uk/impactofsocialsciences/2011/08/22/impact-­‐strong-­‐
weaponevidence-­‐based-­‐case-­‐for-­‐enhanced-­‐research-­‐support-­‐but-­‐a-­‐state-­‐of-­‐
the-­‐art-­‐approach-­‐to-­‐measurement-­‐is-­‐needed	  
FAO/Bioversity	  International	  (2012)	  FAO/Bioversity	  Multi-­‐Crop	  Passport	  
Descriptors	  V.2	  [MCPD	  V.2].	  11p.	  	  CGIAR	  Principles	  on	  the	  Management	  of	  
Intellectual	  Assets	  (2012)	  
http://www.cgiarfund.org/sites/cgiarfund.org/files/Documents/PDF/cgiar_
principles_management_intellectual_assets_7march_2012.pdf	  	  	  
Graf	  C,	  Wager	  E,	  Bowman	  A,	  Fiack	  S,	  Scott-­‐Lichter	  D,	  Robinson	  A	  (2007)	  Best	  
practice	  guidelines	  on	  publication	  ethics:	  a	  publisher's	  perspective.	  
International	  journal	  of	  clinical	  practice	  61(152):	  1-­‐26.	  
Hartter	  J,	  Ryan	  SJ,	  MacKenzie	  CA,	  Parker	  JN,	  Strasser	  CA	  (2013)	  Spatially	  Explicit	  
Data:	  Stewardship	  and	  Ethical	  Challenges	  in	  Science.	  PLoS	  Biol	  11(9):	  
e1001634.	  doi:10.1371/journal.pbio.1001634	  	  
Hijmans	  RJ,	  Cameron	  SE,	  Parra	  JL,	  Jones	  PG,	  Jarvis	  A	  (2005)	  Very	  high	  resolution	  
interpolated	  climate	  surfaces	  for	  global	  land	  areas.	  International	  journal	  of	  
climatology	  25(15):	  1965-­‐1978.	  
Hoddinott	  J,	  Yohannes	  Y	  (2011)	  Ethiopian	  Rural	  Household	  Surveys	  (ERHS),	  
http://hdl.handle.net/1902.1/15646	  UNF:5:k2eYxsY6t/jVXblm/UAkRg==	  
International	  Food	  Policy	  Research	  Institute	  [Distributor]	  V7	  [Version]	  
ICRISAT Dataverse http://dataverse.icrisat.org/dvn 
Reilly, S (2012) The Role of Libraries in Supporting Data Exchange 
http://conference.ifla.org/past/2012/116-reilly-en.pdf. 
PAGE	  -­‐	  24	  -­‐	  
IFPRI	  Dataverse	  http://thedata.harvard.edu/dvn/dv/IFPRI	  
IMWI Water Data Portal 	  http://waterdata.iwmi.org/ 
International	  Water	  Management	  Institute	  (2011)	  Research	  Data	  Management	  
Policy	  And	  Implementation	  Guideline	  
Muraya	  P,	  Coe	  R	  (2001)	  Research	  Discussion	  Paper	  2:	  Looking	  after	  our	  
investments:	  Improving	  Research	  Data	  Management	  in	  ICRAF,	  
https://sites.google.com/a/cggmail.org/cgiar-­‐data-­‐management-­‐
meeting/resource-­‐documents	  
ODE	  Report	  on	  Integration	  of	  Data	  and	  Publications	  (2011)	  
http://www.alliancepermanentaccess.org/wp-­‐
content/uploads/downloads/2011/11/ODE-­‐
ReportOnIntegrationOfDataAndPublications-­‐1_1.pdf	  
Quisumbing	  A,	  Baulch	  B	  (2010)	  Chronic	  Poverty	  and	  Long	  Term	  Impact	  Study	  in	  
Bangladesh	  http://hdl.handle.net/1902.1/17045	  	  
UNF:5:8MUn92HhwQhRKF69wSTwaA==	  International	  Food	  Policy	  Research	  
Institute	  [Distributor]	  V5	  [Version]	  
Tenopir	  C,	  Allard	  S,	  Douglass	  K,	  Aydinoglu	  AU,	  Wu	  L,	  Wu	  L,	  Read	  E,	  Manoff	  M,	  and	  
Frame	  M	  (2011)	  Data	  Sharing	  by	  Scientists:	  Practices	  and	  Perceptions.	  PLoS	  
ONE	  6(6):	  e21101.	  doi:10.1371/journal.pone.0021101	  	  
World	  Agroforestry	  Centre	  -­‐	  ICRAF	  Dataverse	  (2011),	  
http://thedata.harvard.edu/dvn/dv/icraf	  
World	  Agroforestry	  Centre	  -­‐	  ICRAF	  Geoportal	  (2012),	  
http://geoportal.worldagroforestry.org/	  
	  
PAGE	  -­‐	  25	  -­‐