Intro 

A sitemap is a special XML-File in a standard location, which tells search engines which sites exist on your server, when they have last changed and how they should be prioritized. Recently I wondered how to give my flask projects a sitemap. There are existing solutions like flask-sitemap but:

  • they only support basic sitemap features and no google-extensions
  • they are unmaintained for multiple years

Since sitemaps are only XML-pages we can implement ourself.

Creating a sitemap location 

We need the standard location, being /sitemap.xml to serve a sitemap with the correct MIME-type:

@app.route("/sitemap.xml")
def sitemap():
    return flask.Response(getSitemapXmlDocument(), mimetype='application/xml')

Create a sitemap 

We might want to skip some urls, we should define those first:

skip = ["/api", "/ignore/this" ]

Then get all existing endpoints, except those we want to ignore and those that do not support a GET-request.

urls = []
for rule in app.url_map.iter_rules():

    # skip all endpoints #
    if any([s in rule.endpoint for s in app.config["SITEMAP_IGNORE"]]):
        continue

    # skip all non-GET endpoints #
    if not "GET" in rule.methods:
        continue

An expoint is not an URL, that means we need to get the actual URL to add to the sitemap from the endpoint. Getting this URL might result in a werkzeug.utils.Builderror for various reasons, those endpoints are usually not relevant for a sitemap however. A rule may have multiple endpoints, but one endpoint has zero or one URL, so still within the loop:

    # get url for endpoint, get start time and set priority #
    try:
        url = flask.url_for(rule.endpoint, **(rule.defaults or {}))
        urls += [url]
    except BuildError:
        pass

You could now also add more URLs which are not clearly defined by endpoints, for example if you have a bunch of articles described by an URL (like /example) and ?id= url-parameter you can add them like this:

urls += [ "/example?id={}".format(ident) for ident in SOME_LIST_OF_IDS ]

We absolute URLs for out sitemap, which means we need a hostname and a protocol, if you are not running a reverse proxy, you can get those information from flask.request otherwise the reverse proxy should set those information as a header (usually as X-REAL-HOSTNAME and X-Forwarded-Proto), comercial reverse proxies may already do that by default.

hostname = flask.request.headers.get("X-REAL-HOSTNAME")
protocol = flask.request.headers.get("X-FORWARDED-PROTO")
baseHost = "{proto}://{host}".format(host=hostname, proto=protocol)
if not baseHost:
    baseHost = request.host_url
    # host url returns PROTO + HOST + PORT + '/'
    # we dont want the '/'
    baseHost = hostname.strip("/")

Our sitemap will only contain a collection of URL for which the top level element expected is a urlset:

top = et.Element('urlset', xmlns="http://www.sitemaps.org/schemas/sitemap/0.9")

It should be noted that the beaviour of XML-libraries

for url, lastmod, priority in urls:
    child = et.SubElement(top, 'url')

    chilLoc      = et.SubElement(child, 'loc')
    childLastmod = et.SubElement(child, 'lastmod')
    childPrio    = et.SubElement(child, 'priority')

    childPrio.text    = str(priority)
    childLastmod.text = lastmod.strftime("%Y-%m-%d")
    chilLoc.text      = "https://" + hostname + url

Finally set a xml file header and then dump all elements into a string:

xmlHeader = "<?xml version='1.0' encoding='UTF-8'?>"
xmlDump = ""
xmlDump += et.tostring(top, encoding='UTF-8', method='xml').decode()
return xmlDump

Now we can check the xml-sitemap in the google search console or a site like the XML Sitemap Validator.


Feel free to send me a mail to share your thoughts!