Computing College Helps Non-Profit Build Public Data Access Tools
DataSourceNJ, as a startup and non-profit organization aiming to democratize access to local public data, is turning to NJIT's Ying Wu College of Computing for technical expertise in data access, organization, analysis and visualization.
The organization is led by entrepreneur Michael Goldstein, data analyst Greg Frank and veteran journalist Rod Hicks, who believe that clever software can give small newsrooms and the general public access to information previously limited to corporations, national-level media and people with deep pockets.
"We became aware of the impact the moribund state of local journalism was having on our democracy," Goldstein said. "The work we want to do is through automation and machine learning, [to] help people answer questions and to provide analytical tools and presentation tools, so that insights can be gleaned in seconds versus months."
To make it happen, Goldstein approached NJIT for prototyping and was introduced to Munir Cochinwala, who manages many of the college's software development projects. Cochinwala and computer science graduate student Nikita Nemane will develop easy-to-use software that synthesizes public data such as campaign contributions, permits, property taxes and public records of all stripes.
"You can do a query across all of these [data sets] without being a technical person," Cochinwala explained. "We're going to do appropriate data cleanup and data reconciliation so the data and resulting answer makes sense," such as distinguishing between street names vs. human names, or funds for salaries vs. revenue from taxes.
Users will be able to cross-reference their findings to determine, for example, whether the majority of political donations came from the same people who are a public official's subordinates, or whether they are people who solicit public contracts. "Machine learning capability will be provided for analysis. The solution will be used to identify clusters and correlations as well as deviations in the data," the agreement between DataSourceNJ and NJIT states.
The software will be adaptable so that new data sources can be added as they're discovered, and new methods of analysis can be added as they are needed.
"It's not going to be perfect. We're just trying to be reasonable about it," Cochinwala noted. "It's a challenge because you can't really integrate data across many places, perfectly. We'll have to make some allowances and do some machine learning … Once we have something working, then we can get some journalists to look at it, and we can even get the student newspaper to look at it."
Goldstein added that he hopes student developers will be selected based on their dedication to making the world a better place, not just their desire for a good portfolio project or work experience. He also would like the system to be open-sourced when it's ready for use.
"We're not doing this for money or fame. We're doing this because we want this to happen. I'd love to create a Center for Data Democracy at NJIT," he said. "We don't want to be viewed as being partisan, outside of if you're antidemocratic, we don't want you involved.