TASK_URL_READER

Knowledgebase

TASK_URL_READER: Extract and Process Web Content in Google Sheets

Function Overview

The

TASK_URL_READER

function allows you to fetch and display content from any website directly in your Google Sheets. It pulls information from a specified URL and places the result in a designated cell, making it easy to incorporate web data into your spreadsheets without manual copying and pasting. This function uses the Jina reader service, which offers a free plan with rate limitations and a paid plan for higher volume needs.

Uses

Extract article content for analysis or summarization.
Monitor pricing information from product pages.
Import documentation from websites into your spreadsheet.
Gather data from multiple web sources for comparison.
Feed website content into AI language models (like
TASK_LM
) for processing.
Extract information from websites with restrictive APIs.
Create a database of information collected from various websites.

Syntax and Parameters

TASK_URL_READER

(url, [return_format], [use_cache], [celladdress])

Parameter	Data Type	Required	Description
url	`string`	Required	The web address of the page you want to extract (e.g., "https://www.example.com")
return_format	`string`	Optional	How you want the content formatted. Options include `"default"` , `"text"` , `"html"` , `"markdown"` , `"screenshot"`
use_cache	`boolean`	Optional	Whether to save and reuse previous results (default: `true` )
celladdress	`string`	Optional	Where to place the output (e.g., "C1"). If not specified, results appear in the cell to the right of the function

Advanced Options

Return Format Options

Text Format (
`"text"`
)

Returns just the readable text from the webpage with minimal formatting:

=TASK_URL_READER("https://www.example.com", "text")

This is ideal when you only need the raw text content without any HTML tags or formatting.

HTML Format (
`"html"`
)

Returns the complete HTML code of the webpage:

=TASK_URL_READER("https://www.example.com", "html")

Use this when you need to analyze the page structure or extract specific HTML elements.

Markdown Format (
`"markdown"`
)

Converts the webpage content to markdown formatting:

=TASK_URL_READER("https://www.example.com", "markdown")

This preserves basic formatting like headings, lists, and links in a clean, readable format.

Screenshot Format (
`"screenshot"`
)

Returns a URL to a screenshot image of the webpage:

=TASK_URL_READER("https://www.example.com", "screenshot")

The screenshot option is particularly useful when:

The website's HTML is difficult to scrape
You want to capture the visual layout of a page
You need to feed the visual content to AI models that support image processing

When combined with

TASK_LM

, you can use this to analyze visual content:

=TASK_LM("Describe what you see in this image: " & TASK_URL_READER("https://www.example.com", "screenshot"))

Cache Control

The cache system stores results for 6 hours to improve performance and reduce API calls:

=TASK_URL_READER("https://www.example.com", "default", true)

To always fetch fresh content, set

use_cache

false

=TASK_URL_READER("https://www.example.com", "default", false)

Use the Save icon in the sidebar for long-term storage of cached results across sessions.

Cell Address Targeting

By default, results appear in the cell to the right of the function. To specify a different output location:

=TASK_URL_READER("https://www.example.com", "default", true, "G10")

You can also specify just a column, and the function will use the current row:

=TASK_URL_READER("https://www.example.com", "default", true, "G")

Examples

Basic Usage

=TASK_URL_READER("https://www.wikipedia.org")

This places Wikipedia's homepage content in the cell to the right of the formula.

News Article Extraction

=TASK_URL_READER("https://www.bbc.com/news/world-65360307", "text", true, "B5")

This extracts a BBC news article as plain text and places it in cell B5.

Product Information

=TASK_URL_READER("https://www.amazon.com/dp/B08N5KWB9H", "markdown")

This fetches an Amazon product page in markdown format.

Documentation Reference

=TASK_URL_READER("https://developers.google.com/apps-script/reference/spreadsheet/sheet", "text", false)

This gets fresh (non-cached) Google Apps Script documentation as plain text.

Visual Capture for AI Analysis

=TASK_URL_READER("https://weather.com/weather/today/l/San+Francisco+CA", "screenshot")

This captures a screenshot of today's weather in San Francisco, which could then be analyzed by an AI model.

Best Practices

Keep
use_cache
set to
true
for most cases to avoid unnecessary API calls.
Only use
false
when you need real-time updates.
Use
"text"
for simple content extraction.
Use
"markdown"
when you need basic formatting preserved.
Use
"html"
when you need to analyze page structure.
Use
"screenshot"
when dealing with heavily JavaScript-dependent sites or when you need visual context.
For large extractions, place outputs in unused areas of your sheet.
Consider using separate sheets for content extraction.
Remember that
TASK_URL_READER
uses the Jina reader service which has rate limits on the free plan.
For high-volume needs, consider upgrading to Jina's paid plan.
Use
TASK_URL_READER
's output as input for functions like
TASK_LM
to analyze content.

Troubleshooting

"Error: Network call failed with response code XXX"

Cause: The target website is blocking access or is unavailable.
Solution: Try using a different URL or try again later. Some websites have measures to prevent automated access.

"Error: Response exceeds maximum length"

Cause: The content is too large (over 32,760 characters).
Solution: The function automatically truncates content. Try using a more specific URL that targets a smaller portion of content.

"Invalid cell address"

Cause: The
celladdress
parameter format is incorrect.
Solution: Ensure you're using a valid cell reference in quotes, like "C5" or "B10".

"502 Bad Gateway" or "429 Too Many Requests"

Cause: Rate limits with the Jina reader service.
Solution: Reduce the frequency of requests or consider upgrading to the Jina paid plan.

Nothing happens after running the function

Cause: You need to click the "Run" button in the Sheet Workflows sidebar.
Solution: After entering the function, look for the "Run" button in the Sheet Workflows sidebar.

FAQ

How often is the cache refreshed?

The cache lasts for 6 hours by default. After that time, the function will fetch fresh content the next time it runs.

Generally no. The function cannot access content behind login screens or paywalls unless the site allows public access.

Does this function work with dynamically loaded content?

It depends. Some JavaScript-rendered content may not be captured in text/HTML modes. Try the "screenshot" mode for sites with heavy JavaScript components.

Will this function slow down my spreadsheet?

The caching mechanism minimizes performance impact. However, extracting large amounts of content from multiple URLs may cause some slowdown.

Is there a limit to how many URLs I can process?

Yes, the Jina reader service has rate limitations on their free plan. For high-volume needs, consider their paid plan.

Can I use this function with password-protected or internal websites?

No, the function can only access publicly available content that doesn't require authentication.

How do I use the extracted content with language models?

Use the output cell from

TASK_URL_READER

as input for

TASK_LM

=TASK_LM("Summarize this article: " & B5)

(Assuming B5 contains the

TASK_URL_READER

output)

Can I extract specific parts of a webpage instead of the entire page?

Not directly. However, you can extract the full page and then use other functions like

TASK_LM

to parse specific sections.

elisha

7 months ago

Activity

Elisha changed status to Knowledgebase

3 months ago

Elisha changed status to New

3 months ago

TASK_URL_READER

TASK_URL_READER: Extract and Process Web Content in Google Sheets

Function Overview

Uses

Syntax and Parameters

Advanced Options

Return Format Options

Text Format ("text")

HTML Format ("html")

Markdown Format ("markdown")

Screenshot Format ("screenshot")

Cache Control

Cell Address Targeting

Examples

Basic Usage

News Article Extraction

Product Information

Documentation Reference

Visual Capture for AI Analysis

Best Practices

Troubleshooting

"Error: Network call failed with response code XXX"

"Error: Response exceeds maximum length"

"Invalid cell address"

"502 Bad Gateway" or "429 Too Many Requests"

Nothing happens after running the function

FAQ

How often is the cache refreshed?

Can I extract content from sites that require login?

Does this function work with dynamically loaded content?

Will this function slow down my spreadsheet?

Is there a limit to how many URLs I can process?

Can I use this function with password-protected or internal websites?

How do I use the extracted content with language models?

Can I extract specific parts of a webpage instead of the entire page?

Activity

Categories

Text Format (
`"text"`
)

HTML Format (
`"html"`
)

Markdown Format (
`"markdown"`
)

Screenshot Format (
`"screenshot"`
)