Better Together

This assignment is based on this.

Please try this and this. Many names such as David Madigan and Noah Smith are difficult for search engines such as Semantic Scholar. This exercise provides a visualization to help resolve some of these ambiguities.

Tasks

Write a program that inputs a name such as Noah Smith and outputs a visualization like this:

Note: Your picture will look different because you will be using different embeddings.

Please post code and output pictures on GitHub (or Colab), and share links to your code on Canvas.

Here is a colab link that includes some of the hints below: my colab.

Suggested steps:

input name and output list of candidates and their papers
input paper id and output embeddings
compute pairwise similarities
plot similarities

Suggested improvements:

Sort candidate authors by citations.
Sort candidate papers by citations.
Limit candidate authors and candidate papers to n-best, for some reasonable value of n.

Some Useful Python Packages

Hints: the following python packages may be useful:

import json,requests
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib
import matplotlib.pyplot as plt

Documentation of Background Material

You won't need most of these tutorials for this homework, but they are good things to know, and and will be useful later for this class. Some of them are very open ended.

numpy: python tools for arrays
requests: python tools to call URLs
json: python tools to process requests for API calls
Semantic Scholar API: tools to request fields from Semantic Scholar; fields include papers, authors, embeddings and more
sklearn: python tools for machine learning
SciPy: more python tools for machine learning
NetworkX: Network (Graph) analysis in Python
HuggingFace: a popular Hub for models, datasets and tutorials on deep nets for natural language (and more)
matplotlib: python tools for plotting
imshow: part of matplotlib
GitHub: Tutorial on GitHub

Step 1: Input name and output list of candidates and their papers

j = requests.get('https://api.semanticscholar.org/graph/v1/author/search?query=David Madigan&fields=name,citationCount,papers,papers.citationCount').json()

Step 2: input paper id and output embeddings

p = requests.get('https://api.semanticscholar.org/graph/v1/paper/00707ba45ffe6efa08a59693c47801211ca634d6?fields=title,embedding,citationCount,title').json()

Step 3: compute pairwise similarities

See here

Step 4: plot similarities

There are many tutorials on imshow such as this.