An Evolving Environment: Privacy, Security, Migration and Stewardship

well thank you for joining us all the way through till the end of our fall membership meeting this year it’s been quite a remarkable meeting on several levels you may have noticed or at least sense that this has been a record-setting meeting in several dimensions not just attendees but the number of proposals for breakout sessions was far in excess of anything we’ve seen before we in fact had to turn away a number of breakout sessions although I hope you feel we did well with the core of them that we pretty offered to you at the meeting it was certainly a the largest response we’ve ever had to the executive roundtable that we held on digital supporting digital humanities at scale and I’m just delighted that all of you could join us for this event I’ve got a couple of things I want to do in this session first I want to say a couple of thank yous then I want to say fill you in on something and say a little congratulations and then I want to spend about an hour sharing some observations on themes and think events during the past year and how those are shaping and influencing some of our thinking about our program going forward and to highlight a couple of specific programmatic things we will be doing over the course of the remainder of the program year I will aim to be sure that we have enough time for some questions on the end that people get out promptly in enough time to make their trains planes and other conveyance a–‘s so first by the way by way of things I’d like to have you join me in thanking the huge number of presenters we had this year I note that not only did we have a whole lot of sessions but a whole lot of those sessions were in fact collaborations between one two or three collaborators who work together to put together the coverage that you saw in those breakouts that’s a lot of work and these are you know really the core part of our meetings in many ways they’re about things that you are doing collectively you’re experimenting with you’re exploring so please join me in a big round of applause for all those folks second off I want to say thanks to the CNI staff which worked pretty hard to pull this meeting together it’s been a big complicated meeting I’d like to thank along with that staff the AV team that has been diligent in trying to make everything work in an increasingly large number of rooms and i also want to particularly thank the folks from the University of Maryland I school who have volunteered to help out a bit with some of our breakout sessions set up an AV thank you all for all of your efforts in making this run smoothly and now for a little congratulation many of you knew Paul Evan Peters the founding director of the coalition when he died one of the things that we did was established a fellowship fund in his memory this is a fund that historically has gone to a graduate student in library or information Sciences who our award committee which is which varies from year to year believes echoes Paul’s interests and personality not just in terms of excellence in service at an academia but in a certain commitment to

social good through information 22 making a difference in society through the availability of information and information services and indeed also a certain sense of humor and irony in this now we noticed in the last few years that this was almost invariably going to PhD candidates and one of the obvious reasons for that is that they just tend to have a little more experience a little more that they can point to on their resume than masters students very natural sort of thing but we were a little uncomfortable with this in the sense that we knew that practitioners were every bit as important to Paul as scholars indeed he he went back and forth himself in his career as practitioner and scholar regularly and so we were really delighted that we had the resources this year for the first time to award two fellowships one for a doctoral student and one limited to masters level recipients this year jordan ash ashlar from the university of washington received the doctor’ll award and our masters level recipient was Olivia Dorsey from the university of north carolina at chapel hill each of these students will hold the fellowship for two years and then we will have another round of selection i want to take a moment to thank the committee that worked through a lot of nominations and applications for these that I think chose some really fine recipients from a very very good pool those committee members were Ellen Bukowski from Union College who represented educause Clem go throw from Colby who represented CNI Jennifer paston Bell from Brigham Young University who represented ARL and of course John lippincott ex officio I don’t know if Jordan or Olivia are still here if you are stand and be applauded otherwise we’re going to just applaud anyway congratulations to you okay let me get on to the main part of the program this is this is it this covers a lot of material that I normally would cover in the opening plenary and I have to say this is one of the hardest talks I give every year because it is it needs to be wide-ranging but it also needs to be selective if we ever want to get out of here there are so many things happening now that are so interested and so complex Lee interrelated that it is really hard to thread the landscape and so I’m going to make some observations tell a couple of stories and I am going to try and be selective so if your favorite key development isn’t in here think a little bit about how it connects up to the ones that I am nominating before you and how it might fit into this kind of broader program the place I want to start is with security and privacy these have become absolutely pervasive kinds of issues you saw in the opening plenary how much concerns about security and privacy particularly security interplay with these notions of moving services on to the net depending on remote organizations and remote facilities much more than we have in the past it’s very important to realize that security and privacy are really separate things although interrelated in complicated ways and that privacy itself has become a very complicated multi-headed thing historically a lot of it was about privacy from the state some of it was about privacy from one’s neighbors now there is a vast vast commercial enterprise in trading information about you and some might argue systematically

avoiding invading your privacy those are different problems that call for different solutions there are actually a number of people who are starting to think that we have dealt with many privacy problems a bit in the wrong way in the sense that it’s hopeless to keep information secret and what you really should be doing instead is punishing people who make nasty uses of it rather than punishing the people who one way or another fail to keep it secret that’s a very you know interesting observation which I think we will see play out in various areas there have been certain things I would characterize as you know spectaculars in this area which raised you know amazing numbers of questions the Snowden revelations those you know suggest a enormous breach of security in organizations that presumably are supposed to be one the best of the world at it and certainly are well-funded shall we say to try and maintain security some of the information coming out of there suggests that policy decisions were made rather systematically to undermine security in a lot of the national and international networking infrastructure which if it’s true and I think there is good evidence that there’s at least some truth in it suggests that we’ve got a lot of work to do to strengthen that that infrastructure I think most people who look at security seriously are very skeptical about the notion of selective compromise the you know the back door that’s only used by the good guys and the bad guys can never open somehow those seem to get opened by all kinds of people you don’t expect and that’s been true over and over and over again in the recent history of IT it is worth noting that the Snowden breach also is kind of a high-water mark in a trend of another kind which raises all kinds of issues for archives and libraries and research collections here you have yet another example of a really large and untidy database of material this is not you know a short memo that somebody leaked out that everybody can read or The Times can publish on its front page and then it’s well embedded into the cultural record this isn’t even something on the scale of the old pentagon papers from the early 70s which was a pretty fat book of documents but nonetheless a very tractable exam acquisition which I would bet still resides in many of the research libraries here and is an integral set of source documents for people doing research in many aspects of policy deploy see military history and related things from that era here we’ve just got this big old data set which is cached away in various places the government is still not real comfortable with this database to the point where as I understand it it cannot be used as reference material in classes that are taken by government employees because they would be mishandling classified documents if they did their homework they’re there all kinds of strange things about this but one one wonders in this and a number of six of predecessor disclosures of large corpora what how are we going to manage these kinds of really important caches of source documents and who is going to do it very very interesting question I’m not aware of any institution that’s stepped up very clearly and said we’re going to deal with that because it’s clearly a set of source material of very high consequence leaving that piece aside you can certainly read about you know any number of other security and

privacy problems in the press I think one term that has become popular and it’s very unfortunate is the term data breach which kind of suggests there’s this event where the bad guys come in they you know bully they break down the wall and carry off all this loot and and you know then you come and you put the wall back up and try and figure out where to you know track down the loot or minimize the damage there seems to be a lot of evidence that many systems are compromised now for long periods of time they’re not just in there for a quick loot they’re in there to listen find their way into more things and to really take a much longer term look at what’s going on those that that I think is an important distinction and one to be mindful of we seem to be at least from what I read in the press seeing a spectacular example of this with the sony corporation where at least the way i read the press accounts they have basically lost control of their entire corporate IT infrastructure and it may be they have to rebuild the whole thing from scratch to get any control over it back it’s that bad though this is unfortunately the new reality we run into I don’t think that anybody has you know magic answers about security and privacy but I think that just as folks like the internet engineering task force in light of recent events and the new threat environment are kind of taking into an inventory of a bunch of design cases and choices and going back and dealing with things that were too much trouble to get right the first time reassessing the strength of various algorithms and things that we need to be doing this in our communities a little more systematically when we have there’s been some very good material coming out of edge of causes joint security efforts with with internet too but much of it is is fairly broad based I think we have some particular issues about security and privacy that concern our community some of its really easy stuff it’s just why are we being sloppy and sending things in the clear when we don’t need to when it’s easy not to do it their design choices that were made in you know and this is this is sort of a systematic mistake that the Internet has made since day one this underlying assumption that it’s kind of a benign world out there that who would bother to do this just as one case in point it came to my attention recently that the whole infrastructure around the protocol for metadata harvesting who would want to masquerade as a repository and inject bad data into the various harvesters there why would they do that I’ll leave the answer to that question to you as a net for you as an exercise I would just note that all of a sudden now we’re using these as major sources for maintaining inventories of research data and things of that nature and it really would be good if those inventories were reasonably accurate the the hooks to do that kind of thing aren’t in the protocol right now fixing this is not rocket science it just needs to get looked at systematically we are going to be convening probably in February the exact date isn’t set yet a smallish kind of semi Invitational semi-open meeting to start building a shopping list of these things that are particularly relevant to our community I think that it’s high time to do this a lot of this is easy and it’s just appropriate to focus some attention on this there are two things that I would further note in this area that I think are harder and more painful to deal with and I don’t know exactly what we do about them one is the the sacrifices that we need to make in order to get licenses for certain kinds of

tutorial particularly things that are predominantly consumer marketplace material as opposed to say the output of scholarly publishers if you just look at the sort of compromises that public libraries have had to make in order to be able to license material for their patrons they’re pretty uncomfortable in areas like privacy and they should be I’m not feeling real good about some of those I think that it’s combined often with bad technology choices but this is an area where I think we need to we need to really reflect long and hard and I think we need to do it across the whole cultural memory sector the other thing the last thing that I put on this agenda around privacy and security issues or perhaps this call to build an agenda more accurately is levels of assurance levels of assurance is basically an idea that says you know how rigorous do you want to be with evidence in trusting someone so you for example have identity credentials it levels of assurance some people will issue you an idea if you have a working email box that they can send a message to so you can click and confirm it’s yours you’ve all seen that for mailing lists there are other ideas that people are a little fuzzier about issuing you know and they want to see you in person with your passport and your mother and you know those are those are heavier weight IDs that you can you can trust a little more one one of the you know sort of great engineering observations is it’s always easier to do this right the first time than it is to retrofit the bad news is if you do it right the first time sometimes it takes forever and the whole sis them is overtaken by events or buy something that doesn’t do security nearly as well but I’m struck that we are now building a whole new apparatus of author identity a factual biography we’ve seen you know little things around the edges some experiments in verifiable citation so when someone claims they published this in that journal you can check it out easily but there is no agreement I think and indeed little discussion about what our expectations are of the system whether whether we want things to be strongly verifiable or whether we’re going to basically assume that people are telling the truth and the you know the people who don’t will get outed eventually when they overreach part of the problem here is trying to understand how widespread what problems are you know it used to be that the perception was that plagiarism was not much of an issue in the scholarly literature now most major publishers share a database that they actually use for automated plagiarism checking as part of their editorial processing work flow what do we believe about this new world of you know factual biography and author identity does this need to have a high level of assurance or are we just going to assume that basically almost everyone is friendly enough to that we should we should take them at their word so those are a few places specifically around security and privacy that I just want to take note of i will say that particularly in the commercial sphere it is stunning how much we don’t know about how personal information is passed around and resold and there’s a lot of work to do just understanding what putting in that area I will come back a little later to just one more aspect around privacy and security in the context of human subjects and research data but bear with me as I push on here so another clear trend is that we are really genuinely pretty serious about research data management at this point we are still waiting eagerly and I think with growing impatience for some of the policies to come out of the federal

agency funding agencies that actually tell us what the ground rules are going to be in implementing the OSTP directives about access to federally funded research outputs but meanwhile in the kind of broader context we’re seeing a genuine focus on data data sharing big data is you know one of the fashions of the month many of you probably had an opportunity to see Phil born here in the one o’clock slot today or in other venues he was appointed earlier this year as the first assistant director for data science for the whole National Institutes of Health and I think the creation of that role was a you know another underscoring of how important they are recognizing data management and research data to be and they of course are you know building on a long and visionary history that goes back to the work of particularly the National Library of Medicine in the 1970s and beyond but you’re seeing this in other government agencies you’re certainly seeing this in business one of the developments I’ve watched with considerable fascination is city governments getting very interested in big data to run better cities and the emergence of a series of academic centers in what I could only describe as urban informatics fundamentally that are working in close partnership with their host cities both in the United States and abroad I think there are some very very important things that are starting to happen there share I think now which is going to be from my point of view I think kind of a backbone inventory and analytic tool for understanding research data responsibilities within the research and higher ed sector I think now has got a relatively clear vision of where it’s heading and is starting to move along I think that will be significant there’s still so many things that we aren’t coping with very well in this area though data involving human subjects continues to be a huge problem and I’m not sure we have got an effective conversation framed yet between those who come to the issue in terms of human subject privacy and dignity protection and those who come to emit from the view of what we can achieve if we can genuinely share and reuse data very freely we’re looking for ways to advance that conversation but I want to tell a story that I keep coming back to and turning over in my mind because it’s just so amazing at so many levels and it illustrates a number of fault lines that are developing here we’ve talked a little bit before about how there’s a whole sort of alternative universe of social science that’s evolving out there in the commercial sector people doing experiments and studies that you could never get away with inciting you know city but nonetheless studies that I think they sleep well at night having done and where they have been genuinely respectful and thoughtful about privacy and impact just not framed the same way that our traditional I arbys in academia frame them so some of you probably know the story but you should all know the story so sometime earlier this year anything is late spring or thereabouts– the Proceedings of the National Academies of science publishes a paper now this is a paper that is jointly published by some researchers at Cornell and some folks who’d Facebook research and here’s the basic idea if I’ve got it right at least in a nutshell there is this theory called emotional contagion and sort of the short version of this is that if your circle of friends keeps telling you depressing stuff all the time you will you will probably tend to reflect depressing stuff back to them there’s a there’s a sub theory that says that some people flip the other way and get aggressively cheerful and surrounded by depressing stuff but they’re sort of outliers so this is the general theory

and someone came up with a brilliant idea that hey we could test this on Facebook and actually we could get a pretty bit the speculation is that it’s a reasonably subtle effect so that in order to get a big enough sample size to tell you much you got to get a lot of people in the tech in the experiment so what these guys did as I understand it is that they twiddled the algorithm to that makes up your Facebook feed to buy us a bit towards depressing kinds of post and in order to figure out which postings were depressing they did a little sort of light computational sentiment analysis they then looked at what people were sending out who were on the receiving end of this depressive bias feed at did the similar sentiment analysis on there to see if they were trending a little depressed and if my memory serves me they did this to something like 60,000 people drop in the bucket for facebook you know not even a serious experiment but huge by the kind of numbers that the academic world generally thinks in terms of and they actually found that there was a little bit of truth in this so that was the experiment fundamentally and this paper came out and people started just freaking out in various directions there was one set of people mostly academics who said we’re what I are be allowed this was there even an IRB in the loop this is outrageous where were the 20-page informed consent forms every time you log on to Facebook just you know this sort of reaction then there was another group that said you know actually this seemed pretty harmless if you think about it it’s not really a good experiment to do with informed consent for various reasons because it kind of undermines what you’re doing and it’s not really a very dangerous one there’s no evidence that you know people were jumping out windows their feed was so depressing or something like that and that this actually should be you know viewed as a pretty clever experiment then we ought to think about other things we should do like this and of course like most ethics debates this really hasn’t been resolved there are people who I think have made very cogent and particular cases on both sides of it there are people who I think are quite legitimately worried that things that are normally product optimization in industrial settings may now be kind of reframed potentially as human subjects experiments with enormously destructive effects to a lot of Industry although there are also people wondering if maybe we don’t need a little regulation in some of this as a side note you know I had an opportunity this fall to attend a conference at MIT on digital experiments and hopefully they will get around to putting the video from this up and i will share it on CN i announced but i can guarantee you that at least ninety percent of you here while you’ve heard these sorts of ideas about AP optimization on these large services you know where you think well if I move this from the left-hand corner to the right hand corner of the screen I’ll sell a few more things or something and you probably know that companies like Amazon and Google regularly do kind of live testing of tweaks to their interfaces and algorithms where they just design you know 1 out of 5 for 1 out of 50 users coming in to get the new thing and then look at behavior that’s been well known for years what’s less well known is that they’re not doing a be testing anymore they’re doing thousands and thousands of tests a year with incredibly complex statistics to measure and avoid crosstalk and cross interference between these tests they run these tests for very very long time sometimes looking for pretty low-level effects which if you add them up over the scale of the system turn into real

money in the course of a year you know that doesn’t take much optimization to make 20 or 30 million dollars a year this this is way down in the decimal points so there’s a there’s a huge industry in this but here’s the part of the Facebook thing which surprised me the most and and worries me the most in some ways there were a lot of unhappy people who were facebook users and the reaction there was how dare you mess with the facebook feed algorithm why that you know was handed down perfectly somewhere in the distant past and they don’t understand that actually you know there are a thousand engineers at Facebook who are fiddling that algorithm and running 20 different versions of it every day and the only thing that’s different here is they’re not trying to optimize on an objective function of more dwell time so Facebook and sell more ads the it appears that you know one of the things that this its hadn’t really put a spotlight on is how little people understand the extent to which their interactions with all kinds of things are shaped and personalized by algorithms in random and non-repeatable ways this popped up in another context recently had an opportunity to join a discussion that was hosted by the Reynolds School of Journalism out in Columbia Missouri on preserving the news and one of the things that people don’t realize is how personalized news has become if all of us go to the New York Times I assure you we’re not all going to see the same page not even close so the notion of what it means to preserve web pages of whatever kind in environments that are intensely personalized and where in fact the personalization algorithms are under you know rapid experimentation and evolution and evaluation is a very serious and intellectually complicated problem which I think we’re going to see showing up in in many different contexts it’s also one that I think deserves some attention as we think about what are the important points these days to be stressing in terms of information literacy and understanding your information environment so you know that incident I think begs a hundred things it it asks questions about reproducibility of results it asks questions about what’s appropriate ethically where business trade secrets fit in it raises questions about what kind of evidence we should be collecting to support future research in these areas I think it’s a very salient kind of example of some of the challenges around big data research we’ve also started some conversations and there was a very helpful clear workshop that took place on Sunday that probed this matter a little bit about things that we have in our archives and special collections that need to be restricted in some way or that are in very ambiguous status very good examples our materials collected let’s say before 1900 where certain present-day norms about collecting data how to do anthropology things of that nature simply weren’t in place and you know telling somebody that well that would be regarded as you know probably unethical research that could pass a review board today is very problematic this is the only research we got about some of these things and the only research we will ever have on some of these lost civilizations and places and things we have to talk about these things and what to do about them no matter how awkward it is and I was just thrilled to see that complex of issues starting to surface let me move from there though and talk about another one of these sort

of adjacent areas that are very troublesome as we consider these issues around digital science around reproducibility around what to do with research data management in the long term and that’s software we often you know make this kind of casual set of statements about oh yeah software preservation software sustainability big problems got to do something about those well we do got to do something about those I think it’s time to really take a much closer look at what’s going on in some of these areas and what the impacts are I think that there is massive confusion right now about what sustainability means the difference between sustainability and preservation what it means to preserve software and I think actually there are probably about five distinct and more less separable uses of that term preserve that seek to accomplish different things it’s really time I believe for a bit more nuanced analysis there I think this also ties into some phenomena that we have a very poor grasp on and that we touched on in the opening conversation in the context of enterprise and other large-scale software and that’s about rates of obsolescence and change in some sense it’s very desirable to always keep everybody on the current version because if you’re a if you’re developing that software it’s easier to deal with only one version in the field it’s easier to introduce changes if you’re kind of automatically keeping everybody up to date but the flip side of that is that particularly when you look at the commercial world vendors in various positions have enormous motivations to put people through what are frankly at best unproductive cycles of very short-term forced obsolescence I believe that that you can make some support for a hypothesis that says that open source software does better with this if you look for example at the back compatibility of something like Linux i think that there’s some evidence although there are people who know much more than I do about this and these things are hard to quantify that they’ve done a pretty careful job of bat compatibility I would contrast that to 0 for example Apple which has just gone almost crazy in the last couple of years on these very very aggressive short cycles of forced absolute obsolescence which caused a lot of collateral damage to the broader software environment every time they do it all of the applications that somebody needs to find money to rewrite I think we need to understand things like these four stops alessan cycles and what they imply for tool selection in areas like digital humanities where there’s not a lot of money to rewrite everything every year or even indeed in the sciences what difference they make to our various kinds of preservation and sustainability strategies we’ve seen a set of new tools starting to emerge in this area based on virtualization technologies there are a couple of fascinating projects that try and approach software preservation through emulations Rothenberg’s old dream actually genuinely sort of come true I think this is an area where it’s high time to really look at what’s going on I don’t end it is also coupled to this movement of things to the net that we opened the open the meeting by talking about so that’s an area that’s on my mind and that I hope we can explore together a bit during the coming year we as i mentioned did a did an executive roundtable here on digital humanities and you can look for supporting digital humanities at scale you can look forward to a report on that sometime in the coming year and i do want to note that we just launched shortly before this meeting quite a deep

set of web pages that go with a summary report that was included in your packets dealing with the role of digital scholarship centers which is something that joan lippincott has been leading an investigation of for some time these are very closely linked I think it seems clear that various kinds of diffusion and support centers digital scholarship centers more faculty oriented and disciplinary oriented digital humanities centers in some cases these are all important mechanisms for diffusion of information about technology and methodology among faculty they seem to have a particularly key role in digital humanities although their use now is extending much wider than that in some cases we’re going to be doing some follow on work there I believe in the coming program year one of the things that was striking and this is often the case when we pull together a workshop is we asked the people who were coming to the workshop on digital scholarship centers to complete a brief form saying what they were doing and how long they’d been doing it and things like that so we could get some sense of the sorts of projects represented and those are all available on the web page is dealing with the thing but we got a lot of requests in that had the general form of we are thinking about this looking at it planning for it trying to decide what to do about it etc and it would be really helpful for us to be able to sit in on the discussion and learn from the folks who are already deep in we decided that we would turn those folks away from the actual meeting in order to keep it small and have the discussion we wanted to have but provide them the report and the other information from the centers who were the scholarship centers who were represented there and I think that will be very helpful for those folks but there seems to be a good interest in a good deal of interest in going forward and looking at some form of workshop that is for institutions who are trying to plan such a such a center to try and help them identify the key choices and planning parameters the success factors that have worked for others and things like that so we expect we will be doing some kind of an event very possibly can although this is not settled yet connected with the spring CNI meeting which will look at this and provide an opportunity to engage planning issues around such centers rather than operational issues for those who’ve been there for a few years we also recognize that there are a lot of things that have the word center in it and actually they are widely varying in character and purpose there is no right answer at least unless you put me in charge of terminology and I can just make you know sort of arbitrary decisions there’s no right answer to what is and isn’t a digital scholarship Center but we think that it might be a real help for people who are trying to do things in this area to at least summarize the points of disagreement and the different kinds of things that are parked under these headings so that when we have conversations about them going forward we can be clear about which conversation we’re having and do a little less talking past each other or lengthy definitional negotiation before we can get on with things that one if we go ahead with it will probably be a smaller event and we’ll invite a few people who’ve had some deep experience in there and try and produce a relatively succinct document with some examples that will just fill facilitate conversation in that area this whole question of you know how we diffuse technology and new research methods which seems to be so heavily focused on the humanities today keeps coming back again

again and I think it’s really important that we try and get some better handle on what’s going on here I want to close by talking about a sort of a big strategy thing that I’ve been thinking about for a few years and I’ve touched on before and if you’ve been going to the right sessions here you will see a number of sessions that help advance this agenda in various ways so here’s my sort of fundamental thesis without getting into a lot of detail if you ask the question how are we doing in terms of preserving and providing stewardship for the our cultural memory in the society broadly so that would suit subsume but not that by any means be limited to the scholarly record it includes culture our records of government a whole lot of things many many of which serve as evidence for scholarly investigation if you just ask the question how are we doing on this nobody can answer it they can maybe answer it for little sub pieces but not even there the answers are poor if you ask the obvious follow-on questions are we doing better this year than we were last year we have no idea how to answer that or here’s a really salient question I agree we need to do something about this and we’re not going we’re not doing as well as we should how much would it cost to do fifty percent better than we’re doing now we have no idea where to even begin to to answer that question but these are questions I think we need to start getting a handle on and there have been some kind of point investigations in there like the excellent work of the keepers registry and related activities about who’s archiving what scholarly journals and what scholarly journals are not being archived the studies that columbia and cornell did on what proportion of their periodicals were in fact archived that was very helpful at this meeting we had a report on some studies that are being done of digital-only music and the short bottom line on that right now is other than copyright deposit into the Library of Congress we basically have no mechanism to get that material into any of the institutions that are concerned with the cultural record very bad in mass market books you see a very similar kind of problem emerging we know that we are in a unfortunately less and less slow-motion train wreck dealing with the audiovisual materials of the 20th century the equipment is mostly gone the memory of the formats is going fast the actual media that the stuffs written on is busy rotting away this is a big problem a big an urgent problem but it’s not a problem until recently that we could get much of a hand go on other than well it’s really big and it’s really ugly which is not a persuasive pitch to make to a donor this is going to require some big infusions of funds in the interest of cultural memory and if we can’t provide some pretty good ballparks we are vast disadvantage Indiana University a couple of years ago took a pretty systematic inventory of their problem this was big it was ugly it was expensive and they actually were successful in winning a sizable down payment on the problem from the leadership there earlier this morning we had a report from New York public which is a very special kind of institution because of where it is and its ability to raise funds in the whole context of the New York business and cultural community they’ve they are just completing and sharing results there their numbers are bigger and uglier than

Indiana by a substantial portion but I think that it is a huge contribution to be able to quantify the scope of this problem and I sincerely hope that they are able to take the next step from there to mobilizing resources to deal with it at the Q&A in that session other institutions indicated that they were starting to move on this front and actually one of the good things that’s starting to happen now that we have a couple of stakes in the ground is some of the methodological work is more stable so getting to getting those surveys done is getting a bit easier those give us some sense of the scale of the problem in various areas the work that OCLC research is carrying forward met for example their workshop tomorrow on the evolving scholarly record and the boundary points around there the conversations about research data management all of these are parts of understanding this this collection of problems the discussions with that we’ve touched on here and in other venues about what does it mean to preserve the news for example in the world all of these are part of a very broad agenda of mapping and trying to measure the scope of the preserving and providing stewardship for the cultural heritage and cultural memory enterprise I hope that in the coming year we are going to be able to feature some more important sessions there draw your attention to other work that’s taking on that that’s going on in that sphere perhaps identify some additional problem areas that could use some attention NYU has done some very good work in recent years about the convivio industry consumer video of various kinds I suspect there are horrible things waiting to be discovered if we look at video games of various sorts and there have been some preliminary investigations there although I think they have probably been more focused on the difficulties of handling individual video games as opposed to dealing with this problem in the large sum of this is further complicated by the fact that a lot of these industries everything from the news industries to the book publishing industry are getting viciously restructured along various lines with mergers failures of companies of long standing and other sorts of things that keep changing the landscape and challenging our ability to even get good basic statistics about what’s going on so that we can do the kind of analyses that folks like keepers or cornell and columbia have been able to begin to do around scholarly journals so i would just leave you with that as a kind of an overarching challenge that you will be hearing more about and hopefully seeing more insights in I think thinking about it in that systematic way is a helpful thing to do as a way of kind of coordinating measuring and prioritizing our collective work in meeting our social needs in this area so those are some of the things I think you can expect to see increased function and in some cases some specific actions on from the coalition working of course in partnership with many other organizations and ultimately of course in partnership with our members to advance that’s about all i want to say today i would be very happy to take a couple of questions on our program going forward things we have been doing things we’ve not been doing or thoughts on other developments that we haven’t touched on so thank you for listening and the floor is open takers there are microphones there ah ah oh well thought we had a live one

there no takers really I thought there was somebody in there if something in there that must have worried someone okay well it has been a very go for it thanks cliff and thanks for a great overview of the program for the next year in the opening plenary which touched on some of the topics you just mentioned there was a lot of there were some questions in the audience around preservation some of the risks involved and you mentioned some of those what are your thoughts on the risks of organizational preservation related to the other preservation issues were dealing with and then attached that just like you know your thoughts about open source as an exit strategy in association with there was a chances of organizational failure those are both great questions so knowing that something is open source as an exit strategy is something that leaves me feeling better than not having it but not a lot better it’s a little bit like there’s a there’s a similar pattern that was used for commercial software going back into the 60s and 70s called source escrow where the idea was that if your application software provider went broke you would be able to pop open this safety deposit box and hopefully there would be some reasonably current source code in there and maybe even some documentation and you would be able to go forward it’s it’s kind of a lot like the open source model except that you don’t get to inspect whether you’ve really got an adequate source base until it’s too late to do anything about it so the open source model is clearly superior in that sense and there are ways you can deal with the escrow thing where you have third-party audits and stuff like that but it gets the transparency of open source is really kind of nice in that sense practically speaking a most institution to have been running something on a hosted network app where somebody else has been taking care of everything for the last five years and that suddenly goes away and they have to stand up the open source code and another facility are probably in really deep trouble they don’t have the expertise internally to do that they don’t have the deep knowledge and experience with the code to figure out how to provision hardware for it and optimize that they are at a big disadvantage sometimes they can work their way out of that disadvantage usually by spending a lot of money rather rapidly if for example the company goes bust and you can hire up three of their lead developments to deal developers to rehost the code at your place you’ve got a significant advantage to not knowing anything about it and having to learn it but I do i do see it as sort of a expensive last resort in most cases for these big complex systems the business of institutional vulnerability and failure is one that has been a lot on my mind lately because going back to these sort of structural comments about the cultural record one of the things that seems to be true is that there’s a lot of culturally important material parked here and there in companies or other organizations whose business models are falling apart so just take the case of newspapers now there used to be a lot of newspapers in this country they used to be a ton of local newspapers then people started buying them up and basically minimizing the amount of local news in them they lost their advertising revenue to Craigslist they they lost market share to television and the web and other sorts of things and they went broke we have enormously fewer newspapers in this country today than we did 20 years ago all of those newspapers had not only back issues which mostly were parked in libraries so we’ve got those what was published but they have very deep databases in the form of morgues so you remember the news used to be a very

parsimonious thing right you’d send out a photographer you take 20 photos you’d publish one and the other 19 go in the morgue for possible future use now you send somebody out you take 20 photos you put one on the front page or in the print edition if you’re still making one and then say click here for more photos so all of those archives meant most of them and there are some fascinating exceptions got lost when the newspapers failed some of them ended up on ebay we are very bad at transferring assets particularly out of corporations into cultural memory institutions at scale and under serious time and economic pressure indeed we don’t even necessarily have clear ideas of whether this is appropriate behavior or whether they should be putting it up on ebay as part of the bankruptcy and we could talk about a lot of other examples but that’s just one of them I think this whole business of transitions of stewardship is a very very key problem that requires real close attention I’ve been looking at a couple of case studies over the last year or so but to me it’s starting to emerge as one of the kind of systematic challenges we face here you know I can’t resist building on that one cliffs so you just nailed it the systematic challenges there Kevin Skinner educomp yeah with the newspapers as an example and just proving that a little bit more deeply one of the things that that I’ve noted and that I’ve heard you note in multiple settings is how challenging it is to get the right people around a table when the right people aren’t all in the same field or even in the same connected set of fields so do you have any advice on how to address some of these system-wide problems like the failure of news and the failure of our preservation of news which are intertwined but that involved stakeholders that include the journalists themselves who have been disempowered the local news owners who some still exist but not very many and they’ve been disempowered the conglomerate switch don’t know what to do with newspapers they’re not making money off of them so they’re shoving them aside plus the libraries the public libraries I mean it’s a swirling system i’d love to hear you speculate on how to start to address that well i mean obviously it takes collaborations among multiple players what’s really hard here is that you need convening organizations that can convene across an incredible array of places you know the Library of Congress for example just by virtue of you know the brand of the institution has some substantial convening power but not in some of the areas you name so we we clearly are going to need we’re going to need groupings of people I would really love to see some outreach to people who are involved in corporate leadership and governance who have an interest in some of these things to start framing them from that perspective you know people who are on the boards of directors of newspapers and presumably as part of their governance role are trying to do some balancing of corporate and private interests in some cases we need people you know at that level to be part of the conversation and some of that in terms in turn i think presents us with a huge challenge in talking about these problems in a way that is accessible and clear to the thoughtful general public and it doesn’t get too far down in the weeds it’s very dangerous to let this I think Kareem over into narrowly legal discussions when really this is much more a discussion about you know what sort of a culture we want to be which is a different kind of conversation and we need I think to really seek ways to do that some of them maybe are gonna have to return to some very traditional kinds of thinking about getting messages out

through various kinds of campaigns remember the films that clear did in the I guess probably the 1990s now slow fires I think that those were you know quite legitimate and somewhat effective efforts at communicating some of these issues maybe we need to do things like that I don’t know but clearly the the convening the scope and breadth of the convening challenge here is really huge I think people are about ready I think planes are calling let me wish you safe travels let me thank you for joining us and thanking you for your support of the coalition hopefully I will see many of you in Seattle and I suspect I’ll see many of you and other venues between now and april have a very good holiday season and end of the year thank you