- Inventor of the World Wide Web, Sir Tim Berners-Lee;
- Archivist of the United States, David Ferriero;
- Professor James Hendler of RPI, who will moderate the discussion;
- UNC professor and Director of iBiblio.org,Paul Jones;
- Andrew McLaughlin, Deputy U.S. Chief Technology Officer in the Executive Office of the President; and
- University of Southampton Professor Nigel Shadbolt, Director of the Web Science Trust and Web Foundation
Notes only; no analysis.
Q: In the past couple of years, increase in govts experimenting with public release of data. What are the arguments that are persuasive and what’s the future?
Tim Berners-Lee : My favorite argument – I think the main reason for putting gov’t data on the web is it is a resource that has huge potential value. It was used for doing one thing – but making public allows mashups, extends the value of the data. If it is exposed, you get serendipity — you put it up for one reason but people will use it for something you haven’t thought about. [Kathy-> basic truth about innovations.] Value is hugely more than the bandwidth. Yes it takes some effort; you may have to persuade other people. Imagine you’re thinking about moving your company or establishing a new company: how to figure out how business-friendly a country or state or city might be?
If govt puts data out there about how the govt runs, this transparency enables anyone to hold the govt accountable. Transparency. In the UK that was a very strong feeling inside the government – stronger checks and balances.
McLaughlin : There is a third argument. Open data helps you do your job better, more effectively, cheaper. Now the issue is to develop incentives so that people are willing to share data. Disruption. Scarcity in the form of control over data is being disrupted. Revenue generators like maps — when you pull the plug on that control, that agency business model and reason for existing has been obliterated. Rules, cajoling, etc required to get long-term culture change. Persuading agencies that they can look better, do better, be more effective, minimize headaches by freeing data and getting it up on the web by default. 1st set of executive memos from Obama Admin dealt with open data as well as FOIA — don’t wait for people to ask.
Q: data release in RDF format right from the start – advantage?
Nigel Shadbolt : We were appointed in June, we used open source software. This was not a traditional IT procurement in the gov’t sense. We wanted the data published in any machine-readable format. We were happy to have CSV or XLS files — but goal is RDF and linked so that we can design URIs. How do you label roads and postal codes? You can mix vocabularies so agencies don’t have to use the same schema. 3200-and-something data sets but only a small %
Q: Chief Archivist – does this data release keep you awake at night? How do you approach the archiving.
David Ferriero : I think what keeps me up at night is more having to do with the culture of the gov’t – it’s not a tech problem but a social problem. Help agencies think differently about how they work. Must develop a plan on how they are going to be participatory. Empowered the agency to rehink how it works; it has tremendous transformational power. It’s the beginning of something very different. My job is to ensure that the records of the govt (254 agencies & the president) are gathered and available in perpetuity. Email is recognized as a record on the presidential side but not the agency side. [Kathy: =:-0]
Q: Linked data
Paul Jones : Reframes the question. Watch TBL TED talk. Towns have created exclusionary zones by denying access to sewer, water, education. The Cedar Grove people in Ohio used GIS (from the web) that shows this exclusionary principle. Mebane NC (?) used an anti-open document law — open documents had been changed — under terrorist act. The city had moved its GIS into this “anti-terrorism” protection. [Kathy : =:-0]
For social change to happen … citizens in the US move totally based on where schools are … we have people who come and say “we’re going to be in Chapel Hill for 8 years” and chose home based on access to educational system.
David Ferriero : Location of ? tower in London was a state secret. In general, the presumption of making it open — Econ 101 : information helps the market
Q from Twitter
TBL : first order problem is getting the data there.
[missing a chunk of transcript]
TBL : open data stickers (throws a bunch into the audience – pix from @bethanyvsmith)
McLaughlin : part of the culture of govt that deserves preserving is the dedication to the quality of data – to make as accurate and reliable and high-quality as possible. Publish more data faster is in opposition to this culture. [Kathy: publish then filter versus filter then publish] Example of Recovery.gov — massive data dump — inevitably there were errors in the data. The right attitude is to say “thank you for pointing out the errors in the data” — but for the culture change project, we can’t have this be about sacrificing authority and quality. This is a tension that just has to be resolved.
Need good meta-data that speaks to the quality and reliability of the data. We do this with economic data (v1, v2, v3). One of the tasks for those of us who care about meta-data — how do we signal the degree of confidence in the data.
*** audience participation ***
TBL : anecdote about blog post with data — this is where you will find your bike routes after 48 hours of releasing data without gov’t procurement. When it’s in linked data format instead of 200 page PDF — examples.
Nigel Shadbolt : There are privacy implications, of course. In many cases — social welfare, for example — sharing personal data across agencies makes sense.
McLaughlin : another issue is location data. The peculiar nature of data — my phone talks to the tower every 15 seconds even if I’m not using it — removing the usual identifier doesn’t help there.
Paul Jones : th morning danah boyd will talk about privacy and friday morning <?> will talk about ‘rules for radicals’ (also privacy-related)
Q from audience : release early/release often ethic …. what sort of social changes might be put into place? how can you possibly protect someone in government so that they feel safe? What if the individuals who were releasing the data be “responsible” and treat govt as “carrier” of open data.
McLaughlin : we are trying to revolutionize how govts relate to data. i don’t think simple tactical moves aren’t the answer. move away from “data nerds”. this whole thing will flop if the people who care about the data don’t grab it, mash it, make it their own. Common law problem, not civil law problem.
TBL : UK folks did work with communities — you could get early access if you joined a google group. If someone else comes and says “it’s not perfect” the community can speak to that.
Q From Twitter: power comes from mash-up, but what will be revealed by aggregation? Each data set can be examined — but when you put a lot of them out there, there may be revelation.
Nigel Shadbolt : It’s going to provide pressure on public services. Improving service delivery sounds good, but what happens as issues become “popularized” — fix that pothole. Suddenly investing huge amounts of money fixing potholes, but what is the trade-off?
McLaughlin : It’s not enough to just look at the personal implications of the data in front of you but you have to look at what is public already. That’s just the way it has to be. Agencies have to have data experts who talk to one another and work this through (inter-agency cooperation).
Paul Jones : app to report potholes in Mebene (?) NC is an open database unlike those protected by Homeland Security
[missing transcript here, too]