Wikipedia:Bots/Requests for approval/AppstudiosBot@snippet fetch

AppstudiosBot@snippet fetch

New to bots on Wikipedia? Read these primers!

Approval process – How this discussion works
Overview/Policy – What bots are/What they can (or can't) do
Dictionary – Explains bot-related jargon

Operator: Appstudiobot (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:55, Wednesday, November 5, 2025 (UTC)

Function overview: A read-only bot for abatch job to fetch ~1.2 million article summaries. This is to populate an external database, which will reduce future API load on Wikipedia.

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: Yes

Links to relevant discussions (where appropriate): (This is a technical request for apihighlimits for a read-only task, no prior consensus discussion exists.)

Edit period(s): One-time run

Estimated number of pages affected: 0 (zero). This bot performs no edits.

Namespace(s): None. This bot is read-only.

Exclusion compliant (Yes/No): Yes (Bot is read-only and makes no edits).

Function details: This is a one-time batch job to fetch the first paragraph (summary) of approximately 1.2 million articles from the "Coordinates on Wikidata" category. This data is being collected to populate a database for an external application, in order to lessen the continuous load from external applications on Wikipedia.

API: MediaWiki Action API (action=query, prop=extracts)

Mode: Read-only. The bot will make no edits.

Frequency: This is a one-time run. It is not a continuous or recurring task.

Speed: The script will run at a polite rate (e.g., 1 request per second) and will respect the maxlag=5 parameter.

Rationale: The standard API limit of 50 titles per request would require ~24,500 requests and take ~3 days to complete this one-time task. By having the apihighlimits right (the 500-item batch size), the bot can complete the same task in ~2,500 requests and finish in under 24 hours. This is 10x more efficient and places a significantly lower load on the API servers for this large, one-off read.

Discussion