Douwe Osinga's Blog: Automate Research with a Neptyne Spreadsheet and OpenAI

Wednesday, July 19, 2023

 While ChatGPT definitely has grabbed the headlines when it comes to the AI revolution, using LLMs to automate all kinds of tasks has yielded some interesting results too. A number of frameworks have sprung up like LangChain, AutoGPT and BabyAGI. They allow users to go beyond the simple chat interface and connect all kinds of components to GPT3.5/4, like web search, memory stores or even code writing. Very powerful tools, but not the easiest things to get started with.



In this post we’ll show how you can achieve similar results in a Neptyne spreadsheet without having to worry about deployment. We’re going to create a spreadsheet that autonomously does research. It’s fully customizable and should work for any type of task, but in our current example we’ll focus on researching AI startup funding news. It looks like this::



You specify a (news) search in C2 and the freshness in G2. Column headers E4 to H4 are also configurable and determine the information we want to extract from the articles we find.


After you hit the “go” button, the automatic research will start by sending the search query to Bing’s news search. For each result it will call out to an external service to render the page, running javascript and everything. It will then extract the interesting information from that news article and feed it into ChatGPT, asking that service to find information for each of the specified columns (in this case Company, Amount, CEO and Investors). It will then add a row to the spreadsheet of the information it found.



Get your own research bot


Ready to create your own research AI bot? The bot combines three different services so you will need to sign up for those and you’ll need of course a Neptyne account.


  • Bing News Search. We use this to get a list of news articles based on the query you entered into C2. If you have a Microsoft Azure account, setting this up is fairly straightforward. Comes with a free tier for 1000 searches per month. 

  • PhantomJsCloud. This service takes a url and renders it as html in the cloud. Just getting the html of a document is not enough anymore today. This step is actually the slowest in our pipeline - rendering a modern web page can take time. Sign up is free and the free tier gets you 500 page loads per day.

  • OpenAI. The current code uses ChatGPT 3.5 - you can switch to 4 if you feel like it is missing things, but it’ll be slower and more expensive. 


Sign up for all services and note the api key for each. Now navigate to: https://app.neptyne.com/-/tgjqzmjbfi and make a copy. When you hit the Go button, the system will ask you for the keys and once you’ve entered those, it will start running. You can interrupt the current run by hitting the button again, but it will take a little while since it will finish the current task.

How does this work?

The main code is called from the button Go and lives in the run() method. Here’s the slightly simplified code:


for item in news_search(C2, freshness=G2.value)['value']:

    article = fetch_article(item['url'])

    title, summary = extract_content(article)

    keywords = get_keywords(summary, E4:H4)



news_search returns a list of news articles. fetch_article calls PhantomJsCloud to get a rendered version of the article. extract_content uses the python-readability library to strip all non content from an html text. Finally get_keywords uses OpenAI to extract the keywords from the article. Most of these are pretty straightforward but let’s have a closer look at get_keywords:


def get_keywords(body, keywords):

    prompt = "Given this article:\n\n"

    prompt += body + "\n\n"

    prompt += "Generate a dictionary of key/value pairs in json with keys:"

    keywords = ['"' + kw + '"' for kw in keywords]

    prompt += ", ".join(keywords)

    prompt += "\nLeave out what cannot be found"

    return call_open_ai([{"role": "user", "content": prompt}])


All it does is build a prompt asking for a json document with the key/values for the columns we specify in the spreadsheet. The AI magic does the rest.


There’s a bit of data massaging going on of course, but once we have the data from the article and the keywords, we just insert a new line into the spreadsheet with:


B5:H.insert_row(

     0, [item['datePublished'], item['description'], item['url'], *keywords]

)


Conclusion

The integration of Neptyne spreadsheets with OpenAI opens a vast world of possibilities, from autonomously conducting research to data extraction and analysis. This article provided a detailed walkthrough of how to use the Neptyne spreadsheet in conjunction with various services like Bing News Search, PhantomJsCloud, and OpenAI to create an AI research bot. The bot seamlessly retrieves and parses news articles based on user-specific queries, applying AI to identify key data points from each source. By combining these different services and leveraging their unique capabilities, users can achieve a more automated, efficient, and effective research process.


0 comments: