TASK_URL_READER

Function Overview

The TASK_URL_READER function fetches content from a specified URL and places the result in a designated cell. This function is useful for importing web content directly into your spreadsheet for analysis, processing, or reference.

Uses

  1. Fetch article text from websites for content analysis
  2. Import product descriptions from e-commerce pages for comparison
  3. Capture financial data from public websites into your spreadsheet
  4. Retrieve documentation or technical specifications from online resources
  5. Monitor webpage changes by periodically fetching and comparing content
  6. Convert HTML content to markdown for better readability in spreadsheets
  7. Take screenshots of websites for visual reference
  8. Extract data from websites that don't offer a formal API

Syntax and Parameters

=TASK_URL_READER(header, url, [return_format], [use_cache], [celladdress])

Parameter Data Type Required Description
header string Yes Column header for the task template
url string Yes The URL whose content to fetch (e.g., "https://www.example.com")
return_format string No The format to return the page content in: "default", "text", "html", "markdown", or "screenshot" (defaults to "default")
use_cache boolean No Whether to use cache to load/save the result (defaults to true)
celladdress string No The cell address to place the result (e.g., "C1"). If not specified, the cell to the right of the function cell is used

Advanced Options

Return Format Options

The return_format parameter allows you to control how the content is retrieved and formatted:

  • default: Returns the raw content as determined by the URL service
  • text: Returns plain text content with HTML tags removed
  • html: Returns the raw HTML content
  • markdown: Converts HTML to markdown for better readability
  • screenshot: Returns a visual snapshot of the webpage

Caching & Freshness

The use_cache parameter controls whether previously fetched content is reused:

  • When set to true (default), responses are cached for 6 hours to improve performance
  • Set to false when you need to ensure you're getting the most up-to-date content
  • For long-term storage across sessions, use the Save icon in the sidebar after fetching

Output Cell Targeting

The celladdress parameter lets you control where the fetched content is placed:

  • Use a cell reference like "C1" to specify an exact location
  • If omitted, the function places the result in the cell to the right of the function cell

Examples

Basic URL Fetching

Fetch the content of a webpage and convert it to markdown:

=TASK_URL_READER("Web Content", "https://www.example.com", "markdown")

After clicking Run in the sidebar, the markdown content will appear in the cell to the right of the formula.

Capturing a Screenshot

Take a screenshot of a website and place it in a specific cell:

=TASK_URL_READER("Screenshot", "https://www.google.com", "screenshot", true, "D5")

After clicking Run, a visual snapshot of the webpage will appear in cell D5.

Fetching Fresh Data

Ensure you get the latest content without using the cache:

=TASK_URL_READER("Stock Data", "https://finance-site.com/stock/AAPL", "text", false)

After clicking Run, the latest stock data will be fetched and placed in the adjacent cell.

Importing HTML Content

Retrieve the raw HTML from a webpage for further processing:

=TASK_URL_READER("Raw HTML", "https://developer.mozilla.org", "html")

After clicking Run, the HTML source will appear in the cell to the right of the formula.

Best Practices

  • Always select empty cells in the column and click the Run button in the Sheet Workflows sidebar after entering the function to execute the URL fetch
  • Use the "markdown" format for better readability of fetched content
  • Enable caching (use_cache=true) to improve performance and reduce API usage
  • Set use_cache=false when you need the most current version of a frequently updated website
  • Be mindful of large webpages which may be truncated if they exceed 32,760 characters
  • Consider using specific selectors or API endpoints rather than full webpages when possible
  • For sensitive data, ensure the URLs you're fetching don't contain personal or confidential information
  • Respect website terms of service and robots.txt files when fetching content
  • Add appropriate headers to your formula to make it clear what data you're importing

Troubleshooting

Error: Invalid URL

Cause: The URL provided is not properly formatted or is empty.
Solution: Ensure the URL includes the full protocol (http:// or https://) and is properly formatted.

Error: Unable to access the URL

Cause: The URL may be behind a login, blocked, or inaccessible.
Solution: Verify the URL is publicly accessible and doesn't require authentication.

Error: Content truncated

Cause: The webpage content exceeds the 32,760 character limit.
Solution: Consider fetching specific parts of the page or using a more targeted URL.

Error: Function execution timed out

Cause: The webpage took too long to load or process.
Solution: Try a different return format or check if the website is responding slowly.

Error: Invalid cell address

Cause: The specified cell address is not in the correct format.
Solution: Use proper A1 notation (e.g., "C1") or leave blank to use the default behavior.

Warning: Cached content may be outdated

Cause: You're seeing old content due to caching.
Solution: Set use_cache=false to fetch fresh content.

FAQ

How do I know when the function has completed fetching the URL?

The function will show "Running..." while it's executing. When complete, the result will appear in the target cell. For large pages or slow connections, this may take a few seconds.

Can I fetch content from password-protected sites?

No, TASK_URL_READER can only access publicly available content that doesn't require authentication.

What is the maximum amount of content I can fetch?

The function has a limit of 32,760 characters. If the content exceeds this limit, it will be truncated with "..." at the end.

Why is my content showing as an error message?

If you see "Error:" followed by a message, it indicates a problem with fetching the URL. Check that the URL is valid, publicly accessible, and that you have an active internet connection.

How frequently can I refresh URL content?

While there's no strict limit, it's good practice to limit refreshes to once every few minutes to avoid overloading either your sheet or the target website.

Can I schedule automatic refresh of URL content?

Yes, you can set up a time-driven trigger that runs a script to execute the TASK_URL_READER functions in your sheet. This requires creating a custom script that calls the RUN_CELL function.

Does TASK_URL_READER work with all websites?

Most websites should work, but some may block automated access or require specific headers that the function doesn't provide.

Can I extract specific elements from a webpage?

The function returns the entire webpage content. For more selective extraction, you may need to use TASK_LM or TASK_ROW_UPDATE with a prompt to extract specific information from the fetched content.

How is the screenshot format generated?

When using return_format="screenshot", the service renders the webpage and returns a visual representation. This is useful for capturing the visual state of a website.

What service does TASK_URL_READER use?

The function uses r.jina.ai as a proxy service to fetch and format web content, which enables features like markdown conversion and screenshots.

Elisha

1 week ago

Categories
Features