Posted at 1:46 PM
I had the honor to speak at the Strata + Hadoop World Conference in San Jose, CA today. This Big Data conference brings together thousands of the smartest people in the tech and data world to share best practices in data science in the private, non-profit and public sectors. As “America’s Data Agency,” the U.S. Commerce Department plays a key role in providing the data that serves as the foundation for many of the innovations we see coming out of the private sector.
In fact, Commerce bureaus produce thousands of data sets ranging from the economy to demographics to trade, and our massive data collection on climate literally reaches from the depths of the ocean to the surface of the sun.
But none of this matters if our data is inaccessible or hard to use.
We can play an important role in fostering more data innovation IF we make the best use of these resources. That is why we need to do a much better job of making our data easy to find, understand, and access. Our goal is to unleash our data, so technologists - at the Strata conference and across the United States - can use it in new and exciting ways to generate societal benefits and economic value.
To achieve this, we made data a key pillar of our Department’s strategic plan. We hired top talent – including our first Chief Data Officer, Deputy Chief Data Officer and Chief Data Scientist. To guide their work, we established the Commerce Data Advisory Council. Made up of 19 leaders from some of the biggest tech and data companies in the world – including Palantir, Code for America, and Amazon Web Services – this group helps us to identify the challenges we face, guides us towards solutions based on industry best practices, and provides a vision for a data-driven government.
And we launched the Commerce Data Service – an in-house data science and software development team that’s working closely with each of our bureaus to open up more data, build great products and software to better achieve our mission. This impressive team of entrepreneurs and data experts from across the government and Silicon Valley supports each of the Department’s twelve bureaus as they incorporate data science to make their work more impactful, targeted, and cost effective.
One of the challenges we’ve faced so far in our data push is that few people know the extent of our data sets. And even fewer know how to build innovative, useful tools from them. That’s why the Data Service created the Commerce Data Usability Project – to help data scientists and software engineers share tutorials, use cases and data visualization tools to showcase the range of our data sets.
One of the best ways to show our value is to have private sector partners explain how they are using our data. Today, I announced several new collaborations, including:
- Mapbox, the online mapping platform, is providing a tutorial on how to use NOAA’s rainfall data to help us understand our environment better and improve forecasts.
- Zillow is using Census income data to map housing affordability across the country. They’re looking at local salary information for emergency service personnel to show how much of their income they should expect to pay for mortgages in different cities.
- Earth Genome, an environmental nonprofit, is illustrating how to get started with NOAA’s topography data — a fundamental element in their wetlands restoration model for informing industrial investment decisions.
These are just a few examples of how our data can be used. The open code available on the Github pages for each of these tutorials allow users to build on these examples and to gain additional insight from this data.
Another project that we’re particularly excited about is our effort to use data to increase U.S. exports. The Commerce Department is responsible for helping American businesses reach new markets through our International Trade Administration. Across the country, there are a large number of export-ready businesses that could benefit from ITA’s services. But some of these businesses are more likely to export than others, and it can be difficult to identify pockets of opportunity.
To solve this challenge, our Data Service team is using machine-learning techniques to combine our existing datasets with private sector business datasets. This will allow us to develop a predictive model for characteristics of successful exporting businesses. This New Exporters Project will enable us to dramatically increase the number of businesses we help to become exporters.
With trade agreements like the Trans-Pacific Partnership – or TPP – creating new opportunities abroad, this service could substantially increase the level of exports to new and emerging markets, helping to grow U.S. businesses and our economy as a whole.
Overall, we are seeing great value in our data efforts so far. Every part of our Department is starting to take a “think about the data” first approach to their work, and are making sure the information they post is done in a way that is open and accessible. We can use data to accelerate economic growth and create a better, more efficient world. I know that this effort will lead to the next great era of data innovation.