Automating Dead Link Detection | HAHWUL


Using Deadfinder and GitHub Actions for Seamless Link Management

A dead link, or broken link, occurs when a hyperlink points to a web page that has been removed or does not exist anymore. Beyond mere inconvenience, these dead links can significantly degrade the user experience, harm your website’s SEO, and introduce security vulnerabilities. For instance:

  • Phishing Risks: When a domain expires, it can be purchased by malicious entities who might set up fake pages to capture sensitive information from users expecting a legitimate site.
  • Malware Distribution: Sometimes, if a linked resource is taken down, a similar named malicious site might take its place, potentially leading unsuspecting visitors to download malware.
  • Loss of Trust: Frequent encounters with dead links can erode user trust in your content, suggesting neglect or outdated information, which might be exploited by attackers to lure users towards harmful alternatives.

Therefore, maintaining link integrity not only preserves your site’s professionalism and usability but also plays a critical role in safeguarding your visitors from potential security threats.

What is Deadfinder?

Deadfinder is a versatile tool designed to help webmasters and bloggers keep their site’s integrity intact by finding dead links. By the way, this tool was created by me to manage my own website.

And Deadfinder supports GitHub Actions, allowing you to easily run it with the following workflow code:

steps:
- name: Run DeadFinder
  uses: hahwul/deadfinder@1.5.0
  # or uses: hahwul/deadfinder@latest
  id: broken-link
  with:
    command: sitemap # url / file / sitemap
    target: https://www.hahwul.com/sitemap.xml
    # timeout: 10
    # concurrency: 50
    # silent: false
    # headers: "X-API-Key: 123444"
    # worker_headers: "User-Agent: Deadfinder Bot"
    # include30x: false
    # user_agent: "Apple"
    # proxy: "http://localhost:8070"

- name: Output Handling
  run: echo '${{ steps.broken-link.outputs.output }}'

Integrating Deadfinder with GitHub Actions

GitHub Actions is a CI/CD (Continuous Integration/Continuous Deployment) tool provided by GitHub, which allows you to automate your software development workflows directly in your repository. Here’s how you can leverage GitHub Actions to automate dead link detection with Deadfinder:

---
name: DeadLink
on:
  workflow_dispatch:
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Find Broken Link
        uses: hahwul/deadfinder@1.5.0
        id: broken-link
        with:
          command: sitemap
          target: https://hahwul.com/sitemap.xml

      - name: Create Markdown Table from JSON
        id: create-markdown-table
        run: |
          echo "## DeadLink Report" > deadlink_report.md
          echo "" >> deadlink_report.md
          echo "| Target URL | Deadlink   |" >> deadlink_report.md
          echo "|------------|------------|" >> deadlink_report.md
          echo '${{ steps.broken-link.outputs.output }}' | jq -r 'to_entries[] | .key as $k | .value[] | "| ($k) | (.) |"' >> deadlink_report.md

      - name: Read Markdown Table from File
        id: read-markdown-table
        run: |
          table_content=$(cat deadlink_report.md)
          echo "TABLE_CONTENT<> $GITHUB_ENV
          echo "$table_content" >> $GITHUB_ENV
          echo "EOF" >> $GITHUB_ENV

      - name: Create an issue
        uses: dacbd/create-issue-action@main
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          title: DeadLink Issue
          body: ${{ env.TABLE_CONTENT }}

This code runs Deadfinder according to specified conditions to identify dead links, converts them into Markdown format, and then posts them as a GitHub issue so that users can be aware of them. This is also applied to the site you’re looking at now, and since there are many articles, I run it periodically to remove dead links.


Github workflow history


Found and reported deadlinks!

Conclusion

You can easily enhance the quality and security of your site with this straightforward method. Give it a try and manage your site using this technique 🙂



Source link