Your shared draft paper is in Google Docs. Before emailing co-authors a summary update, you want to pull the current abstract. In your current workflow, how long does that take?
search_sheets from Day 19 finds a Sheet by title. For Docs I'd expect GOOGLEDOCS_GET_DOCUMENT_BY_ID — the document ID is in the URL when I open it. I copy it once and use it in the script.
Correct. The document ID is the long string in the Docs URL between /d/ and /edit. Pass it to the action and you get a dict with the document title, body content, and revision info. The body is structured — paragraphs with elements inside them — but you can extract plain text with a simple traversal:
result = toolset.execute_action(
Action.GOOGLEDOCS_GET_DOCUMENT_BY_ID,
{"document_id": doc_id}
)Why is the body "structured" instead of plain text? Can't Docs just return a string?
Google Docs stores rich text — bold, headers, links, tables. The API returns the document structure as nested JSON so you can extract exactly what you need. For a plain text extraction, iterate the paragraphs: each paragraph element has a list of text run elements with a content field. The full content is there — you just traverse the tree:
def read_doc(doc_id: str) -> dict:
result = toolset.execute_action(
Action.GOOGLEDOCS_GET_DOCUMENT_BY_ID,
{"document_id": doc_id}
)
print(f"Document: {result.get('title', 'unknown')}")
return resultSo I get the full document structure back and can extract the abstract paragraph, the title, or the introduction section separately. That's more useful than a flat text dump.
The co-author who keeps emailing asking for the latest abstract is about to be pleasantly surprised by a script that emails it automatically.
Three weeks in and I can read my inbox, my calendar, my task list, my citations Sheet, and my draft paper — all from Python. The capstone pipeline is almost visible.
Document IDs are stable — the ID in the URL doesn't change when you rename the file or move it in Drive. Hardcode the document ID for your most important research documents. For files you discover at runtime, use the Drive search pattern from search_sheets.
result = toolset.execute_action(
Action.GOOGLEDOCS_GET_DOCUMENT_BY_ID,
{"document_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms"}
)In the URL: https://docs.google.com/document/d/**{document_id}**/edit
| Field | Content |
|---|---|
title | Document title |
body.content | List of paragraph elements |
revisionId | Latest revision identifier |
body = result.get("body", {}).get("content", [])
text = "".join(
run.get("textRun", {}).get("content", "")
for elem in body
for par in [elem.get("paragraph", {})]
for run in par.get("elements", [])
)Your shared draft paper is in Google Docs. Before emailing co-authors a summary update, you want to pull the current abstract. In your current workflow, how long does that take?
search_sheets from Day 19 finds a Sheet by title. For Docs I'd expect GOOGLEDOCS_GET_DOCUMENT_BY_ID — the document ID is in the URL when I open it. I copy it once and use it in the script.
Correct. The document ID is the long string in the Docs URL between /d/ and /edit. Pass it to the action and you get a dict with the document title, body content, and revision info. The body is structured — paragraphs with elements inside them — but you can extract plain text with a simple traversal:
result = toolset.execute_action(
Action.GOOGLEDOCS_GET_DOCUMENT_BY_ID,
{"document_id": doc_id}
)Why is the body "structured" instead of plain text? Can't Docs just return a string?
Google Docs stores rich text — bold, headers, links, tables. The API returns the document structure as nested JSON so you can extract exactly what you need. For a plain text extraction, iterate the paragraphs: each paragraph element has a list of text run elements with a content field. The full content is there — you just traverse the tree:
def read_doc(doc_id: str) -> dict:
result = toolset.execute_action(
Action.GOOGLEDOCS_GET_DOCUMENT_BY_ID,
{"document_id": doc_id}
)
print(f"Document: {result.get('title', 'unknown')}")
return resultSo I get the full document structure back and can extract the abstract paragraph, the title, or the introduction section separately. That's more useful than a flat text dump.
The co-author who keeps emailing asking for the latest abstract is about to be pleasantly surprised by a script that emails it automatically.
Three weeks in and I can read my inbox, my calendar, my task list, my citations Sheet, and my draft paper — all from Python. The capstone pipeline is almost visible.
Document IDs are stable — the ID in the URL doesn't change when you rename the file or move it in Drive. Hardcode the document ID for your most important research documents. For files you discover at runtime, use the Drive search pattern from search_sheets.
result = toolset.execute_action(
Action.GOOGLEDOCS_GET_DOCUMENT_BY_ID,
{"document_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms"}
)In the URL: https://docs.google.com/document/d/**{document_id}**/edit
| Field | Content |
|---|---|
title | Document title |
body.content | List of paragraph elements |
revisionId | Latest revision identifier |
body = result.get("body", {}).get("content", [])
text = "".join(
run.get("textRun", {}).get("content", "")
for elem in body
for par in [elem.get("paragraph", {})]
for run in par.get("elements", [])
)Create a free account to get started. Paid plans unlock all tracks.