Moloi Method

I run an AI-powered Discord bot that serves a community of 200+ members. It handles research queries, moderates content, posts scheduled updates, and has processed over 15,000 messages since deployment. This tutorial walks through building one from scratch. By the end, you will have a production-ready AI Discord bot with LLM integration, rate limiting, content moderation, error handling, and a deployment setup that runs 24/7 on a VPS. This is the same architecture I use in production, not a toy demo. <h2 id="what-we-are-building">What we are building</h2> The bot does three things: <ol> <li>Responds to mentions and slash commands with AI-generated answers using an LLM API</li> <li>Rate-limits users to prevent abuse and control API costs</li> <li>Runs as a systemd service on a Linux VPS with logging and auto-restart</li> </ol> Total cost to operate: approximately $5-$30/month depending on usage (VPS + LLM API calls). A $6/month VPS handles a community of 500+ members easily. LLM costs scale with usage. At $0.003-$0.015 per query depending on the model, 1,000 queries/month costs $3-$15. <h2 id="prerequisites">Prerequisites</h2> <ul> <li>Python 3.10+</li> <li>A Discord account and a server where you have admin permissions</li> <li>An API key for an LLM provider (this tutorial uses Anthropic's Claude API, but the pattern works with OpenAI, Google, or any LLM with a REST API)</li> <li>A Linux VPS for deployment (DigitalOcean, Hetzner, or similar, $5-$10/month)</li> </ul> <h2 id="part-1-project-setup">Part 1: project setup</h2> Create the project structure: <div class="codehilite"><pre><code>mkdir discord-ai-bot && cd discord-ai-bot python3 -m venv venv source venv/bin/activate pip install discord.py anthropic python-dotenv </code></pre></div> Your project structure: <div class="codehilite"><pre><code>discord-ai-bot/ bot.py # Main bot file llm_client.py # LLM API wrapper rate_limiter.py # Rate limiting logic config.py # Configuration loader .env # Secrets (never commit this) requirements.txt # Dependencies </code></pre></div> Create <code>requirements.txt</code>: <div class="codehilite"><pre><code>discord.py>=2.3.0 anthropic>=0.39.0 python-dotenv>=1.0.0 </code></pre></div> Create <code>.env</code> with your credentials: <div class="codehilite"><pre><code>DISCORD_TOKEN=your_discord_bot_token_here ANTHROPIC_API_KEY=your_anthropic_key_here GUILD_ID=your_server_id_here </code></pre></div> <h2 id="part-2-create-the-discord-application">Part 2: create the Discord application</h2> Before writing code, you need a bot token from Discord. <ol> <li>Go to the <a href="https://discord.com/developers/applications">Discord Developer Portal</a></li> <li>Click "New Application" and name it</li> <li>Go to the "Bot" tab and click "Reset Token" to generate your token</li> <li>Under "Privileged Gateway Intents," enable Message Content Intent (required for reading messages)</li> <li>Go to "OAuth2" > "URL Generator," select <code>bot</code> and <code>applications.commands</code> scopes</li> <li>Under bot permissions, select: Send Messages, Read Message History, Use Slash Commands, Embed Links</li> <li>Copy the generated URL and open it to invite the bot to your server</li> </ol> Save the bot token in your <code>.env</code> file. Do not hardcode tokens anywhere in your source code. A single leaked token gives anyone full control of your bot. <h2 id="part-3-configuration-loader">Part 3: configuration loader</h2> Start with <code>config.py</code>. This loads environment variables and validates them at startup. Fail fast if anything is missing. <div class="codehilite"><pre><code>import os from dotenv import load_dotenv load_dotenv() REQUIRED_VARS = ["DISCORD_TOKEN", "ANTHROPIC_API_KEY", "GUILD_ID"] def load_config(): missing = [var for var in REQUIRED_VARS if not os.environ.get(var)] if missing: raise EnvironmentError( f"Missing required environment variables: {', '.join(missing)}" ) return { "discord_token": os.environ["DISCORD_TOKEN"], "anthropic_api_key": os.environ["ANTHROPIC_API_KEY"], "guild_id": int(os.environ["GUILD_ID"]), "max_tokens": int(os.environ.get("MAX_TOKENS", "1024")), "rate_limit_per_user": int(os.environ.get("RATE_LIMIT", "10")), "rate_limit_window": int(os.environ.get("RATE_WINDOW", "60")), } </code></pre></div> This pattern (validate all config at startup, crash immediately if something is wrong) saves hours of debugging later. I have seen bots run for days with a bad config, silently failing on every request, because nobody validated inputs at startup. <h2 id="part-4-rate-limiter">Part 4: rate limiter</h2> Rate limiting is not optional. Without it, one enthusiastic user or one bad actor can burn through your entire monthly API budget in an afternoon. At $0.015 per query, 10,000 queries costs $150. A rate limiter caps that exposure. <div class="codehilite"><pre><code>import time from collections import defaultdict class RateLimiter: def __init__(self, max_requests: int, window_seconds: int): self.max_requests = max_requests self.window = window_seconds self.requests = defaultdict(list) def is_allowed(self, user_id: int) -> bool: now = time.time() cutoff = now - self.window # Remove expired timestamps self.requests[user_id] = [ ts for ts in self.requests[user_id] if ts > cutoff ] if len(self.requests[user_id]) >= self.max_requests: return False self.requests[user_id].append(now) return True def remaining(self, user_id: int) -> int: now = time.time() cutoff = now - self.window self.requests[user_id] = [ ts for ts in self.requests[user_id] if ts > cutoff ] return max(0, self.max_requests - len(self.requests[user_id])) def reset_time(self, user_id: int) -> float: if not self.requests[user_id]: return 0 return self.requests[user_id][0] + self.window - time.time() </code></pre></div> Default configuration: 10 requests per user per 60-second window. Generous enough for normal use, tight enough to prevent abuse. Adjust based on your community size and API budget. <h2 id="part-5-llm-client">Part 5: LLM client</h2> The LLM client wraps the API call with error handling, timeout management, and response validation. This is where most bot tutorials cut corners, and where most production bots break. <div class="codehilite"><pre><code>import logging import anthropic logger = logging.getLogger(__name__) class LLMClient: def __init__(self, api_key: str, max_tokens: int = 1024): self.client = anthropic.Anthropic(api_key=api_key) self.max_tokens = max_tokens self.system_prompt = ( "You are a helpful research assistant in a Discord community. " "Keep responses concise (under 1500 characters for Discord). " "Use markdown formatting. If you are unsure, say so. " "Never generate harmful, illegal, or explicit content." ) def generate_response(self, user_message: str, context: str = "") -> str: try: prompt = user_message if context: prompt = f"Context from conversation:\n{context}\n\nUser question: {user_message}" response = self.client.messages.create( model="claude-sonnet-4-20250514", max_tokens=self.max_tokens, system=self.system_prompt, messages=[{"role": "user", "content": prompt}], ) text = response.content[0].text # Discord has a 2000-character message limit if len(text) > 1900: text = text[:1897] + "..." return text except anthropic.RateLimitError: logger.warning("LLM API rate limit hit") return "I am currently rate-limited by my AI provider. Please try again in a moment." except anthropic.APIConnectionError: logger.error("LLM API connection failed") return "I could not reach my AI backend. Please try again shortly." except Exception as e: logger.error(f"LLM request failed: {type(e).__name__}: {e}") return "Something went wrong processing your request. Please try again." </code></pre></div> A few things to note about this implementation. Each failure mode gets its own error handler. API rate limits, connection failures, and unexpected errors all produce different responses. The user gets a helpful message; the log gets diagnostic detail. I enforce the 1,900 character limit because Discord caps messages at 2,000 characters and the LLM does not know this. And the system prompt sets tone, length expectations, and content boundaries. This is your first line of defense against the bot producing inappropriate content. <h2 id="part-6-the-main-bot">Part 6: the main bot</h2> Now we wire everything together. The bot listens for mentions and slash commands, checks rate limits, calls the LLM, and sends the response. <div class="codehilite"><pre><code>import logging import discord from discord import app_commands from config import load_config from rate_limiter import RateLimiter from llm_client import LLMClient logging.basicConfig( level=logging.INFO, format="%(asctime)s [%(levelname)s] %(name)s: %(message)s", handlers=[ logging.FileHandler("bot.log"), logging.StreamHandler(), ], ) logger = logging.getLogger(__name__) config = load_config() intents = discord.Intents.default() intents.message_content = True bot = discord.Client(intents=intents) tree = app_commands.CommandTree(bot) rate_limiter = RateLimiter( config["rate_limit_per_user"], config["rate_limit_window"], ) llm = LLMClient(config["anthropic_api_key"], config["max_tokens"]) @bot.event async def on_ready(): guild = discord.Object(id=config["guild_id"]) tree.copy_global_to(guild=guild) await tree.sync(guild=guild) logger.info(f"Bot online as {bot.user} | Guild: {config['guild_id']}") @bot.event async def on_message(message): # Ignore own messages if message.author == bot.user: return # Only respond when mentioned if bot.user not in message.mentions: return # Rate limit check if not rate_limiter.is_allowed(message.author.id): remaining = rate_limiter.reset_time(message.author.id) await message.reply( f"You have hit the rate limit. Try again in {int(remaining)} seconds." ) return # Strip the mention from the message clean_content = message.content.replace(f"<@{bot.user.id}>", "").strip() if not clean_content: await message.reply("You mentioned me but did not ask anything. How can I help?") return # Show typing indicator while generating async with message.channel.typing(): response = llm.generate_response(clean_content) await message.reply(response) logger.info( f"Responded to {message.author} in #{message.channel}: " f"{clean_content[:80]}..." ) @tree.command(name="ask", description="Ask the AI assistant a question") @app_commands.describe(question="Your question for the AI") async def ask_command(interaction: discord.Interaction, question: str): if not rate_limiter.is_allowed(interaction.user.id): remaining = rate_limiter.reset_time(interaction.user.id) await interaction.response.send_message( f"Rate limit reached. Try again in {int(remaining)} seconds.", ephemeral=True, ) return await interaction.response.defer() response = llm.generate_response(question) await interaction.followup.send(response) logger.info(f"Slash command from {interaction.user}: {question[:80]}...") @tree.command(name="status", description="Check bot status and your rate limit") async def status_command(interaction: discord.Interaction): remaining = rate_limiter.remaining(interaction.user.id) await interaction.response.send_message( f"Bot is online.\n" f"Your remaining queries: **{remaining}** / {config['rate_limit_per_user']} " f"(resets every {config['rate_limit_window']}s)", ephemeral=True, ) bot.run(config["discord_token"]) </code></pre></div> This gives you two ways to interact with the bot: @mention it in any channel, or use the <code>/ask</code> slash command. The slash command provides a cleaner UX and better discoverability for new users. <h2 id="part-7-adding-guardrails">Part 7: adding guardrails</h2> The system prompt provides basic content filtering, but production bots need more layers. Here are three I implement on every bot. <h3 id="input-sanitization">Input sanitization</h3> Discord messages can contain Unicode exploits, excessively long strings, and embedded formatting that confuses LLMs. Sanitize before sending to the API. <div class="codehilite"><pre><code>import re MAX_INPUT_LENGTH = 2000 def sanitize_input(text: str) -> str: # Remove Discord-specific formatting that confuses LLMs text = re.sub(r"<@!?\d+>", "", text) # Remove mentions text = re.sub(r"<#\d+>", "", text) # Remove channel refs text = re.sub(r"<a?:\w+:\d+>", "", text) # Remove custom emoji # Truncate excessively long inputs text = text[:MAX_INPUT_LENGTH].strip() return text </code></pre></div> <h3 id="output-validation">Output validation</h3> Before sending the LLM response to the channel, validate it does not contain content that violates your community guidelines. This is your second line of defense after the system prompt. <div class="codehilite"><pre><code>BLOCKED_PATTERNS = [ r"(?i)\b(api[_\s]?key|token|password|secret)\b.*[:=]\s*\S+", ] def validate_output(text: str) -> tuple[bool, str]: for pattern in BLOCKED_PATTERNS: if re.search(pattern, text): return False, "Response contained potentially sensitive content and was blocked." return True, text </code></pre></div> <h3 id="error-budget-tracking">Error budget tracking</h3> Track your daily API spend and automatically disable the bot if costs exceed a threshold. This prevents a single viral thread from generating a $500 API bill overnight. <div class="codehilite"><pre><code>import json from datetime import date from pathlib import Path class CostTracker: def __init__(self, daily_limit: float = 10.0, state_file: str = "cost_state.json"): self.daily_limit = daily_limit self.state_file = Path(state_file) self._load_state() def _load_state(self): if self.state_file.exists(): data = json.loads(self.state_file.read_text()) if data.get("date") == str(date.today()): self.today_cost = data["cost"] return self.today_cost = 0.0 def _save_state(self): self.state_file.write_text(json.dumps({ "date": str(date.today()), "cost": self.today_cost, })) def record_cost(self, cost: float) -> bool: self.today_cost += cost self._save_state() return self.today_cost < self.daily_limit def is_within_budget(self) -> bool: return self.today_cost < self.daily_limit </code></pre></div> A typical Claude Sonnet query with 500 input tokens and 500 output tokens costs approximately $0.005. At a $10/day budget, that allows 2,000 queries per day, more than enough for a community of several hundred members. Adjust the limit based on your actual usage patterns. <h2 id="part-8-deployment-on-a-vps">Part 8: deployment on a VPS</h2> A Discord bot needs to run 24/7. Your laptop is not a server. Here is how to deploy to a Linux VPS with systemd for process management. <h3 id="upload-and-install">Upload and install</h3> <div class="codehilite"><pre><code># On your VPS mkdir -p /opt/discord-bot cd /opt/discord-bot # Copy files (from your local machine) # scp -r ./* user@your-vps:/opt/discord-bot/ python3 -m venv venv source venv/bin/activate pip install -r requirements.txt </code></pre></div> <h3 id="create-the-systemd-service">Create the systemd service</h3> Create <code>/etc/systemd/system/discord-bot.service</code>: <div class="codehilite"><pre><code>[Unit] Description=AI Discord Bot After=network.target [Service] Type=simple User=botuser WorkingDirectory=/opt/discord-bot EnvironmentFile=/opt/discord-bot/.env ExecStart=/opt/discord-bot/venv/bin/python bot.py Restart=always RestartSec=10 StandardOutput=journal StandardError=journal [Install] WantedBy=multi-user.target </code></pre></div> <h3 id="enable-and-start">Enable and start</h3> <div class="codehilite"><pre><code>sudo systemctl daemon-reload sudo systemctl enable discord-bot sudo systemctl start discord-bot # Check status sudo systemctl status discord-bot # View logs sudo journalctl -u discord-bot -f </code></pre></div> The <code>Restart=always</code> directive means systemd will restart the bot if it crashes. <code>RestartSec=10</code> adds a 10-second delay to prevent rapid restart loops. Combined with the bot's own error handling, this gives you a deployment that recovers from crashes automatically. <h3 id="log-rotation">Log rotation</h3> The bot writes to <code>bot.log</code> in its working directory. Set up logrotate to prevent the file from growing indefinitely. Create <code>/etc/logrotate.d/discord-bot</code>: <div class="codehilite"><pre><code>/opt/discord-bot/bot.log { daily rotate 14 compress delaycompress missingok notifempty create 0644 botuser botuser } </code></pre></div> This keeps 14 days of compressed logs. For a typical bot, that is about 50-100 MB of storage. <h2 id="part-9-advanced-features">Part 9: advanced features</h2> Once the core bot is running reliably, here are three features that add real value. <h3 id="scheduled-posts">Scheduled posts</h3> Use discord.py's built-in task loop to post recurring content: daily summaries, market updates, or community announcements. <div class="codehilite"><pre><code>from discord.ext import tasks import datetime @tasks.loop(time=datetime.time(hour=9, minute=0)) # 9 AM UTC daily async def daily_update(): channel = bot.get_channel(YOUR_CHANNEL_ID) if channel is None: logger.error("Daily update channel not found") return summary = llm.generate_response( "Generate a brief daily community update with today's date. " "Include a motivational note and a discussion prompt." ) await channel.send(f"**Daily Update**\n{summary}") @bot.event async def on_ready(): # ... existing on_ready code ... if not daily_update.is_running(): daily_update.start() </code></pre></div> <h3 id="multi-channel-awareness">Multi-channel awareness</h3> Different channels often need different bot behaviors. A <code>#research</code> channel might need detailed, technical responses while a <code>#general</code> channel needs shorter, casual ones. <div class="codehilite"><pre><code>CHANNEL_CONFIGS = { "research": { "system_prompt": "You are a research assistant. Provide detailed, cited responses.", "max_tokens": 1500, }, "general": { "system_prompt": "You are a friendly community bot. Keep responses brief and casual.", "max_tokens": 500, }, } def get_channel_config(channel_name: str) -> dict: return CHANNEL_CONFIGS.get(channel_name, CHANNEL_CONFIGS["general"]) </code></pre></div> <h3 id="conversation-context">Conversation context</h3> For more natural conversations, pass recent channel history as context to the LLM. <div class="codehilite"><pre><code>async def get_recent_context(channel, limit=5): messages = [] async for msg in channel.history(limit=limit): if msg.author != bot.user: messages.append(f"{msg.author.display_name}: {msg.content[:200]}") messages.reverse() return "\n".join(messages) </code></pre></div> This adds context but also adds cost. Each additional message in the context increases token usage. At 5 messages of context, expect a 30-50% increase in per-query cost. Monitor your cost tracker and adjust accordingly. <h2 id="part-10-monitoring-and-maintenance">Part 10: monitoring and maintenance</h2> A deployed bot needs ongoing attention. Here is the monitoring setup I use. <h3 id="health-check-endpoint">Health check endpoint</h3> Add a simple HTTP health check so external monitoring services (UptimeRobot, Better Stack) can verify the bot is running. <div class="codehilite"><pre><code>from aiohttp import web import asyncio async def health_handler(request): return web.json_response({"status": "healthy", "latency": bot.latency}) async def start_health_server(): app = web.Application() app.router.add_get("/health", health_handler) runner = web.AppRunner(app) await runner.setup() site = web.TCPSite(runner, "0.0.0.0", 8080) await site.start() @bot.event async def on_ready(): # ... existing on_ready code ... asyncio.create_task(start_health_server()) </code></pre></div> <h3 id="monthly-maintenance-checklist">Monthly maintenance checklist</h3> I run this checklist on the first of every month for every bot I maintain: <ul> <li>[ ] Review error logs for recurring issues</li> <li>[ ] Check API cost trends. Are they stable, growing, or spiking?</li> <li>[ ] Update dependencies (<code>pip list --outdated</code>)</li> <li>[ ] Verify rate limit settings are appropriate for current community size</li> <li>[ ] Test the bot manually with 5-10 representative queries</li> <li>[ ] Confirm log rotation is working (check disk usage)</li> <li>[ ] Review and update the system prompt if community needs have changed</li> </ul> Average monthly maintenance time: 30-45 minutes per bot. At $150/hour consulting rates, that is $75-$112/month in maintenance cost. Factor this into your total cost of ownership. <h2 id="what-this-architecture-supports">What this architecture supports</h2> This tutorial gives you a foundation that handles communities up to 1,000 active members. For larger deployments, you would add message queuing (Redis or RabbitMQ) to handle burst traffic, database storage (PostgreSQL) for conversation history and analytics, multiple bot instances behind a load balancer for high availability, and webhook integration to connect the bot with external services. But for most communities, the architecture in this tutorial is more than sufficient. My production bot handles 200+ daily active users on a single $6/month VPS with no performance issues. <h2 id="build-vs-buy">Build vs. buy</h2> Before building a custom bot, consider whether an off-the-shelf solution fits. Tools like MEE6, Dyno, and Carl-bot handle moderation and basic automation well. The case for building custom is when you need LLM-powered responses tuned to your community's domain, integration with your specific business systems (CRM, scheduling, databases), full control over data privacy and cost management, or features that no existing bot provides. If your needs are simpler, start with an existing bot and build custom when you hit its limits. For a broader perspective on when to build custom versus use existing tools, see my post on <a href="/blog/ai-agents-vs-zapier">AI agents vs. Zapier</a>. If you need a custom AI bot built for your community or business and do not want to build it yourself, <a href="/services/automation-audit">check our services</a>. Discord bots are one of the most common agentic engineering projects I deliver. <hr /> <h2 id="frequently-asked-questions">Frequently asked questions</h2> <h3 id="how-much-does-it-cost-to-run-an-ai-discord-bot">How much does it cost to run an AI Discord bot?</h3> A VPS costs $5-$10/month. LLM API costs depend on usage: at $0.003-$0.015 per query, a community generating 500 queries/month costs $1.50-$7.50 in API fees. Total: $6.50-$17.50/month for most small to mid-size communities. The cost tracker in this tutorial helps you monitor and cap spending. <h3 id="can-i-use-a-free-llm-instead-of-a-paid-api">Can I use a free LLM instead of a paid API?</h3> Yes. You can run open-source models like Llama or Mistral locally on your VPS, but you will need a more powerful server (at least 16 GB RAM, ideally with a GPU). The tradeoff is higher server cost ($30-$80/month) but zero per-query API fees. For most small communities, the paid API approach is cheaper and simpler. <h3 id="how-do-i-prevent-the-bot-from-generating-harmful-content">How do I prevent the bot from generating harmful content?</h3> Three layers of defense: a system prompt that explicitly prohibits harmful content, output validation that checks responses before sending, and rate limiting that prevents abuse at scale. No single layer is perfect, but together they provide strong protection. You should also set up a moderation log channel where flagged content is posted for human review. <h3 id="can-this-bot-handle-multiple-discord-servers">Can this bot handle multiple Discord servers?</h3> Yes, with minor modifications. Remove the <code>GUILD_ID</code> restriction and sync commands globally instead of per-guild. Be aware that global command sync takes up to an hour to propagate, and your rate limiting and cost tracking should account for aggregate usage across all servers.

Build an AI-Powered Discord Bot: Complete Python Tutorial with LLM Integration