Python Markdown using MkDocs: Simple RSS Reader and Website¶

A simple example that uses t-python-markdown and MkDocs to create a simple static website.

This should take 15-30 minutes to complete.

This is not meant to be a tutorial for MkDocs, but should give you a basic setup to work from.

Pre-Requisites¶

This example has been written and tested on Linux.

Before you start, please ensure that both python and virtualenv are installed and available.

See this Fedora article which has more details on python and virtual environments.

Installation¶

Use the following to initialise and configure a basic MkDocs setup. First open a terminal and change to a directory where the folder python-markdown-mkdocs will be created. Copy the following, paste it into the terminal and then press «enter»:

virtualenv python-markdown-mkdocs
cd python-markdown-mkdocs
. ./bin/activate
pip install t-python-markdown mkdocs mkdocs-material bs4 lxml
mkdir -p src/docs
cd src
cat <<EOF >> mkdocs.yml
site_name: RSS
site_url: http://127.0.0.1:8000
theme:
  name: material
markdown_extensions:
  - pymdownx.emoji:
      emoji_index: !!python/name:materialx.emoji.twemoji
      emoji_generator: !!python/name:materialx.emoji.to_svg
EOF
mkdocs serve

You should see something along the lines of:

Terminal Output

$ virtualenv python-markdown-mkdocs
cd python-markdown-mkdocs
. ./bin/activate
pip install t-python-markdown mkdocs mkdocs-material bs4 lxml
mkdir -p src/docs
cd src
cat <<EOF >> mkdocs.yml
site_name: RSS
site_url: http://127.0.0.1:8000
theme:
  name: material
markdown_extensions:
  - pymdownx.emoji:
      emoji_index: !!python/name:materialx.emoji.twemoji
      emoji_generator: !!python/name:materialx.emoji.to_svg
EOF
mkdocs serve
created virtual environment CPython3.11.2.final.0-64 in 205ms
  creator CPython3Posix(dest=/home/kevin/temp/python-markdown-mkdocs, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(extra_search_dir=/usr/share/python-wheels,download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/kevin/.local/share/virtualenv)
    added seed packages: pip==22.2.2, setuptools==62.6.0, wheel==0.37.1
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
Collecting t-python-markdown
  Downloading t_python_markdown-1.2.1-py3-none-any.whl (8.6 kB)
Collecting mkdocs
  Using cached mkdocs-1.4.2-py3-none-any.whl (3.7 MB)
Collecting mkdocs-material
  Downloading mkdocs_material-9.1.2-py3-none-any.whl (7.7 MB)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.7/7.7 MB 18.3 MB/s eta 0:00:00
Collecting bs4
  Using cached bs4-0.0.1-py3-none-any.whl
Collecting lxml
  Using cached lxml-4.9.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (7.2 MB)
Collecting PyYAML
  Using cached PyYAML-6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (757 kB)
Collecting click>=7.0
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting ghp-import>=1.0
  Using cached ghp_import-2.1.0-py3-none-any.whl (11 kB)
Collecting jinja2>=2.11.1
  Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting markdown<3.4,>=3.2.1
  Using cached Markdown-3.3.7-py3-none-any.whl (97 kB)
Collecting mergedeep>=1.3.4
  Using cached mergedeep-1.3.4-py3-none-any.whl (6.4 kB)
Collecting packaging>=20.5
  Using cached packaging-23.0-py3-none-any.whl (42 kB)
Collecting pyyaml-env-tag>=0.1
  Using cached pyyaml_env_tag-0.1-py3-none-any.whl (3.9 kB)
Collecting watchdog>=2.0
  Using cached watchdog-2.3.1-py3-none-manylinux2014_x86_64.whl (80 kB)
Collecting colorama>=0.4
  Using cached colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting mkdocs-material-extensions>=1.1
  Using cached mkdocs_material_extensions-1.1.1-py3-none-any.whl (7.9 kB)
Collecting pygments>=2.14
  Using cached Pygments-2.14.0-py3-none-any.whl (1.1 MB)
Collecting pymdown-extensions>=9.9.1
  Downloading pymdown_extensions-9.10-py3-none-any.whl (235 kB)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 235.5/235.5 kB 58.5 MB/s eta 0:00:00
Collecting regex>=2022.4.24
  Using cached regex-2022.10.31-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (781 kB)
Collecting requests>=2.26
  Using cached requests-2.28.2-py3-none-any.whl (62 kB)
Collecting beautifulsoup4
  Using cached beautifulsoup4-4.11.2-py3-none-any.whl (129 kB)
Collecting python-dateutil>=2.8.1
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting MarkupSafe>=2.0
  Using cached MarkupSafe-2.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27 kB)
Collecting charset-normalizer<4,>=2
  Downloading charset_normalizer-3.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (197 kB)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 197.3/197.3 kB 21.0 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.14-py2.py3-none-any.whl (140 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2022.12.7-py3-none-any.whl (155 kB)
Collecting soupsieve>1.2
  Using cached soupsieve-2.4-py3-none-any.whl (37 kB)
Collecting six>=1.5
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: watchdog, urllib3, soupsieve, six, regex, PyYAML, pygments, packaging, mkdocs-material-extensions, mergedeep, MarkupSafe, markdown, lxml, idna, colorama, click, charset-normalizer, certifi, t-python-markdown, requests, pyyaml-env-tag, python-dateutil, pymdown-extensions, jinja2, beautifulsoup4, ghp-import, bs4, mkdocs, mkdocs-material
Successfully installed MarkupSafe-2.1.2 PyYAML-6.0 beautifulsoup4-4.11.2 bs4-0.0.1 certifi-2022.12.7 charset-normalizer-3.1.0 click-8.1.3 colorama-0.4.6 ghp-import-2.1.0 idna-3.4 jinja2-3.1.2 lxml-4.9.2 markdown-3.3.7 mergedeep-1.3.4 mkdocs-1.4.2 mkdocs-material-9.1.2 mkdocs-material-extensions-1.1.1 packaging-23.0 pygments-2.14.0 pymdown-extensions-9.10 python-dateutil-2.8.2 pyyaml-env-tag-0.1 regex-2022.10.31 requests-2.28.2 six-1.16.0 soupsieve-2.4 t-python-markdown-1.2.1 urllib3-1.26.14 watchdog-2.3.1

[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: pip install --upgrade pip
INFO     -  Building documentation...
INFO     -  Cleaning site directory
INFO     -  Documentation built in 0.16 seconds
INFO     -  [21:50:01] Watching paths for changes: 'docs', 'mkdocs.yml'
INFO     -  [21:50:01] Serving on http://127.0.0.1:8000/

Once done, open a browser to http://127.0.0.1:8000. It should show, correctly, a 404 - Not found message. This means that the basics are now in place.

What The Code Does¶

This simple example takes the contents of a number of RSS feeds and presents them over multiple pages. As follows:

flowchart TD
  home[Home]

  home --> feed_1[Feed 1]
  home --> feed_2[Feed 2]
  home --> feed_n[Feed n]

  feed_1 --> feed_1_article_1[Article 1]
  feed_1 --> feed_1_article_2[Article 2]
  feed_1 --> feed_1_article_3[Article n]

  feed_2 --> feed_2_article_1[Article 1]
  feed_2 --> feed_2_article_2[Article 2]
  feed_2 --> feed_2_article_3[Article n]

  feed_n --> feed_n_article_1[Article 1]
  feed_n --> feed_n_article_2[Article 2]
  feed_n --> feed_n_article_3[Article n]

It is meant as a demonstration of how to use t-python-markdown to quickly and easily generate markdown documents.

Python Code¶

Now create a new file python-markdown-mkdocs/src/genrss.py and set to the following:

from bs4 import BeautifulSoup
from t_python_markdown import Document, Header, Paragraph, Sentence, Link, UnorderedList, Bold, Code, HorizontalRule
import json
import requests


class Rss():
  def __init__(self):
    self.__article_count = 0

  def __read_rss(self, rss_feed_url: str):
    """Read RSS feed"""
    try:
      r = requests.get(rss_feed_url, allow_redirects=True)
      soup = BeautifulSoup(r.content, features="xml")
      feed = {_: soup.find("channel").find(_).text for _ in ["title", "link", "description"]}
      feed["description"] = feed["description"] if feed["description"] else feed["title"]
      articles = []
      for a in soup.findAll("item"):
        article = {
          "title": a.find("title").text,
          "link": a.find("link").text,
          "published": a.find("pubDate").text if a.find("pubDate") else "-",
          "description": a.find("description").text
        }
        articles.append(article)
      feed.update({"articles": articles, "url": rss_feed_url})
      return feed
    except Exception as e:
      print(f"Unable to process {rss_feed_url}: {e}")
      return {"title": "", "description": ""}

  def __write_markdown(self, feed: str) -> str:
    """Write markdown using supplied RSS feed details"""
    feed_filename = " ".join(feed["title"].lower().replace("-", "").replace("'", "").replace("–", "").split()).replace(" ", "_")
    d = Document({"title": feed["title"], "hide": ["navigation", "toc"]})
    d >> Paragraph([":arrow_backward: ", Link(" Back to List of Feeds", f"index.md")])
    feed_title = (feed["description"] if feed["description"] else feed["title"]).title()
    d >> Header(feed_title)
    ul = UnorderedList()
    d >> ul
    for lc, article in enumerate(feed.get("articles")):
      article_filename = f"{feed_filename}_{lc:05d}.md"
      title = article["title"] if article["title"] else article["link"].split("/")[-1][:-5].replace("-", " ").capitalize()
      ul >> Sentence([Link(title, article_filename), f" ({article['published']})"], end="")
      da = Document({"title": title, "hide": ["navigation"]})
      da >> Paragraph([":arrow_backward: ", Link(f"Back to {feed_title}", f"{feed_filename}.md")])
      da >> Header(feed["description"].title())
      da >> Header(title, 2)
      da >> Paragraph(Bold(article["published"]))
      da >> Paragraph(article["description"])
      da >> HorizontalRule()
      da >> Sentence([":link:", Link("Link to article", article["link"])])
      da.write(f"docs/{article_filename}")
      self.__article_count += 1
    feed_filename += ".md"
    d >> HorizontalRule()
    d >> Paragraph([":material-rss: RSS Feed URL:", Link(Code(feed["url"]), feed["url"])])
    d >> Paragraph(["Article Count:", str(len(feed.get("articles")))])
    d.write(f"docs/{feed_filename}")
    return feed_filename

  def process_feed_urls(self, rss_feed_urls: list):
    """Process list of RSS feeds"""
    d = Document({"title": "RSS Feeds", "hide": ["navigation", "toc"]})
    d >> Header("RSS Feeds")
    ps = Paragraph()
    d >> ps
    ul = UnorderedList()
    d >> ul
    feeds = [self.__read_rss(url) for url in rss_feed_urls]
    for feed in sorted(feeds, key=lambda l: l["description"]):
      if feed["title"]:
        ul >> Link(feed["description"], self.__write_markdown(feed))
    ps >> Sentence(["There are", str(len(urls)), "feeds listing", str(self.__article_count), "articles"])
    d.write("docs/index.md")


urls = ["https://www.linuxjournal.com/node/feed",
        "https://news.ycombinator.com/rss",
        "https://feeds.bbci.co.uk/news/uk/rss.xml",
        "https://insights.ubuntu.com/feed",
        "https://lordslibrary.parliament.uk/type/research-briefing/feed/",
        "http://www.metoffice.gov.uk/public/data/PWSCache/WarningsRSS/Region/UK",
        "https://pypi.org/rss/packages.xml",
        "https://www.bankofengland.co.uk/rss/news",
        "https://9to5linux.com/feed/"]
Rss().process_feed_urls(urls)

At the terminal and ensuring that you are running in a virtual environment, change directory to src and run the following:

python genrss.py

Afterwards, the website should change to show a list of the RSS feeds.

This example should work for most RSS feeds, but any failures will be ignored.

Further Exercises¶

Improve the Layout¶

Change the layout of the site. This example has a very basic layout. See what you can do to improve it.

Build a Website¶

Use MkDocs to build a website instead:

mkdocs build

and then take the generated output and deploy that using a web server of your choice. Do this update on a schedule (say hourly), and you should have a simple website that captures the news you want and keeps it up to date.

GitLab Pages¶

If like me, you use GitLab, add this code to a GitLab project and create a Gitlab pipeline to deploy this via GitLab Pages. Then, to keep it up to date, add a pipeline schedule.