Dataset of Datasets!

My ideal dataset would be a complete dataset containing all of the Department of Energy Datasets. In looking for data for this contest it was difficult to navigate the many different websites and data formats currently used by different branches of the DOE. To be able to fully use the rich data available it would be very helpful to have a single dataset that contained each of the other datasets in a consistent and coordinated manner.

This sort of could be created using MySQL or other database software. A single table listing all of the major data sources or sets would be the primary structure. Each set would have a description detailing what could be found within it, and it would link to a table (or multiple tables) containing the actual data. Ideally this underlying database would be accessible through a user interface on the internet, and would allow for easy download of individual datasets in a variety of formats (CSV, XML, JSON).

Such a unified dataset would allow data scientists and visualizers to get a better sense of the full depth and breath of the information available to them. Furthermore, the most interesting and useful insights from data often come from crossing several different datasets. In the current state it is difficult enough to find one dataset, and far more difficult to find another that would be compatible or interesting to use with it. A single reliable resource would make this sort of analysis much simpler--leading to stronger and more novel insights.

Most people who have done data analysis can attest that finding and cleaning the data can often be the most difficult part of an analysis. Providing people with a dataset that is already relatively clean and well organized would do more to promote the use of the data than anything else could. People are always looking for more data to use to create graphics and apps, and there is no doubt that the Department of Energy has a great deal of data that is both interesting and important for people to see. The data would recommend itself if it were in a simple and accessible format.

For all of these reasons, my "wishlist" dataset is not some new feat of data collection, but rather a reorganization of all of the wonderful information you already have. For the rest of this contest and for future users, I think this would be the best dataset you could make!



Idea No. 126