/* ---- Google Analytics Code Below */

Tuesday, October 15, 2019

Facebook AI Releases Searchable Code Snippets

Seems a useful thing, what Facebook has put on the web to support AI efforts is impressive.  See more at the Facebook.ai site.  Also note the claimed validation against malware. 

Facebook Releases AI Code Search Datasets   By Anthony Alford in InfoQ

Facebook AI released a dataset containing coding questions paired with code-snippet answers, intended for evaluating AI-based natural-language code search systems.  The release also includes benchmark results for several of Facebook's own code-search models and a training corpus of over 4 million Java methods parsed from over 24,000 GitHub repositories.

In a paper published on arXiv, researchers described their technique for collecting the data. The training data corpus was collected from the most popular GitHub repositories of Android code, ranked by number of stars. Every Java file in the repositories was parsed, identifying the individual methods. Facebook used the resulting corpus in their research on training code-search systems. To create the evaluation dataset, they started with a question-and-answer data dump from Stack Overflow, selecting only questions that had both "Java" and "Android" tags. Of these, they kept only questions that had an upvoted answer that also matched one of the methods identified in the training data corpus. The resulting 518 questions were manually filtered to a final set of 287. According to the researchers:

Our data set is not only the largest currently available for Java, it’s also the only one validated against ground truth answers from Stack Overflow  .... " 

No comments: