Rolling Your Own Plaid-to-Beancount Pipeline - A Python Guide

While tools like plaid2text are great, some of us prefer building custom solutions. After setting up Plaid integrations for several of my small business clients, I wanted to share a guide for building your own Plaid-to-Beancount pipeline from scratch.

Why Build Your Own?

  1. Full control over the data flow and transformation
  2. No MongoDB dependency (plaid2text requires it)
  3. Custom business logic for your specific needs
  4. Learning opportunity to understand how Plaid works

Prerequisites

  • Python 3.9+
  • A Plaid developer account (plaid.com)
  • Basic familiarity with REST APIs

Step 1: Set Up Your Environment

# Create a virtual environment
python -m venv plaid-beancount
source plaid-beancount/bin/activate

# Install dependencies
pip install plaid-python beancount

Step 2: Authentication and Linking

First, you need to link a bank account. Plaid uses a flow called “Link” which requires a web interface for the user to authenticate with their bank.

import plaid
from plaid.api import plaid_api
from plaid.model.products import Products
from plaid.model.country_code import CountryCode

# Configure Plaid client
configuration = plaid.Configuration(
    host=plaid.Environment.Development,  # Use Sandbox for testing
    api_key={
        'clientId': 'YOUR_CLIENT_ID',
        'secret': 'YOUR_SECRET',
    }
)

api_client = plaid.ApiClient(configuration)
client = plaid_api.PlaidApi(api_client)

# Create a link token
from plaid.model.link_token_create_request import LinkTokenCreateRequest
from plaid.model.link_token_create_request_user import LinkTokenCreateRequestUser

request = LinkTokenCreateRequest(
    products=[Products("transactions")],
    client_name="My Beancount App",
    country_codes=[CountryCode('US')],
    language='en',
    user=LinkTokenCreateRequestUser(client_user_id='unique-user-id')
)

response = client.link_token_create(request)
link_token = response.link_token
print(f"Open Plaid Link with token: {link_token}")

You’ll need a simple web page to complete the link flow and receive the public_token, which you then exchange for a persistent access_token.

Step 3: Fetch Transactions

Once you have an access token, fetching transactions is straightforward:

from datetime import datetime, timedelta
from plaid.model.transactions_get_request import TransactionsGetRequest
from plaid.model.transactions_get_request_options import TransactionsGetRequestOptions

def get_transactions(access_token, start_date=None, end_date=None):
    """Fetch transactions from Plaid."""
    if not start_date:
        start_date = datetime.now() - timedelta(days=30)
    if not end_date:
        end_date = datetime.now()
    
    request = TransactionsGetRequest(
        access_token=access_token,
        start_date=start_date.date(),
        end_date=end_date.date(),
        options=TransactionsGetRequestOptions(
            include_personal_finance_category=True
        )
    )
    
    response = client.transactions_get(request)
    transactions = response.transactions
    
    # Handle pagination for accounts with many transactions
    while len(transactions) < response.total_transactions:
        request = TransactionsGetRequest(
            access_token=access_token,
            start_date=start_date.date(),
            end_date=end_date.date(),
            options=TransactionsGetRequestOptions(
                offset=len(transactions)
            )
        )
        response = client.transactions_get(request)
        transactions.extend(response.transactions)
    
    return transactions

Step 4: Convert to Beancount Format

Now the fun part - converting Plaid transactions to Beancount entries:

from beancount.core import data
from beancount.core.amount import Amount
from beancount.core.number import D
from decimal import Decimal

def transaction_to_beancount(plaid_txn, account_mapping):
    """Convert a Plaid transaction to a Beancount transaction."""
    
    # Parse the amount (Plaid uses positive for debits)
    amount = D(str(abs(plaid_txn.amount)))
    if plaid_txn.amount > 0:
        # Debit (expense)
        posting_account = categorize_transaction(plaid_txn)
        bank_amount = -amount
    else:
        # Credit (income/refund)
        posting_account = "Income:Unknown"
        bank_amount = amount
    
    # Get bank account from mapping
    bank_account = account_mapping.get(
        plaid_txn.account_id, 
        "Assets:Bank:Unknown"
    )
    
    # Build metadata
    meta = data.new_metadata('plaid', 0)
    meta['plaid_id'] = plaid_txn.transaction_id
    meta['plaid_category'] = ' > '.join(
        plaid_txn.personal_finance_category.detailed.split('_')
    ) if plaid_txn.personal_finance_category else 'Unknown'
    
    # Create postings
    postings = [
        data.Posting(
            bank_account,
            Amount(bank_amount, 'USD'),
            None, None, None, None
        ),
        data.Posting(
            posting_account,
            Amount(-bank_amount, 'USD'),
            None, None, None, None
        ),
    ]
    
    # Create transaction
    txn = data.Transaction(
        meta=meta,
        date=plaid_txn.date,
        flag='*',
        payee=plaid_txn.merchant_name or plaid_txn.name,
        narration=plaid_txn.name,
        tags=set(),
        links=set(),
        postings=postings
    )
    
    return txn

def categorize_transaction(plaid_txn):
    """Map Plaid categories to Beancount accounts."""
    category = plaid_txn.personal_finance_category
    if not category:
        return "Expenses:Uncategorized"
    
    primary = category.primary
    detailed = category.detailed
    
    mapping = {
        'FOOD_AND_DRINK': 'Expenses:Food',
        'TRANSPORTATION': 'Expenses:Transportation',
        'TRAVEL': 'Expenses:Travel',
        'ENTERTAINMENT': 'Expenses:Entertainment',
        'GENERAL_MERCHANDISE': 'Expenses:Shopping',
        'PERSONAL_CARE': 'Expenses:Personal',
        'MEDICAL': 'Expenses:Health',
        'RENT_AND_UTILITIES': 'Expenses:Housing',
    }
    
    return mapping.get(primary, f"Expenses:{primary.title()}")

Step 5: Write to File

from beancount.parser import printer

def write_transactions(transactions, output_file):
    """Write Beancount transactions to a file."""
    with open(output_file, 'w') as f:
        for txn in transactions:
            f.write(printer.format_entry(txn))
            f.write('\n')

# Usage
plaid_txns = get_transactions(access_token)
beancount_txns = [
    transaction_to_beancount(t, account_mapping) 
    for t in plaid_txns
]
write_transactions(beancount_txns, 'imports/bank.beancount')

Step 6: Deduplication

A critical piece - don’t re-import transactions you’ve already processed:

import json
import os

PROCESSED_FILE = 'processed_transactions.json'

def load_processed():
    if os.path.exists(PROCESSED_FILE):
        with open(PROCESSED_FILE) as f:
            return set(json.load(f))
    return set()

def save_processed(processed):
    with open(PROCESSED_FILE, 'w') as f:
        json.dump(list(processed), f)

def filter_new_transactions(transactions, processed):
    return [t for t in transactions if t.transaction_id not in processed]

Security Considerations

Important: Your Plaid access tokens are sensitive. They provide read access to financial data. Store them securely:

import os

# Good - use environment variables
access_token = os.environ.get('PLAID_ACCESS_TOKEN')

# Better - use a secrets manager
from secretmanager import get_secret
access_token = get_secret('plaid-access-token')

# NEVER hard-code tokens or commit them to version control

Full Script Example

I’ve put together a complete working example in a GitHub gist (hypothetical link) that ties all this together.

Questions and Discussion

  1. What error handling would you add for production use?
  2. Anyone interested in collaborating on an open-source alternative to plaid2text?
  3. How do you handle the initial Link flow in a headless/CLI environment?

Happy to dive deeper into any part of this!

This is a fantastic guide! For question #1 on error handling, here’s what I’d add for production use:

Essential Error Handling

import logging
from plaid.exceptions import PlaidError

logger = logging.getLogger(__name__)

def get_transactions_safe(access_token, start_date, end_date):
    """Fetch transactions with comprehensive error handling."""
    try:
        return get_transactions(access_token, start_date, end_date)
    
    except PlaidError as e:
        error_code = e.code
        
        if error_code == 'ITEM_LOGIN_REQUIRED':
            logger.warning(f"Re-authentication needed for account")
            # Trigger re-link flow
            return None, 'REAUTH_NEEDED'
            
        elif error_code == 'RATE_LIMIT_EXCEEDED':
            logger.warning("Rate limit hit, backing off...")
            time.sleep(60)  # Wait a minute
            return get_transactions_safe(access_token, start_date, end_date)
            
        elif error_code == 'INTERNAL_SERVER_ERROR':
            logger.error("Plaid server error, will retry later")
            return None, 'RETRY_LATER'
            
        else:
            logger.error(f"Plaid error: {error_code} - {e.message}")
            raise
            
    except Exception as e:
        logger.exception(f"Unexpected error fetching transactions")
        raise

Handling Pending Transactions

Something the original guide doesn’t cover - Plaid returns pending transactions that may change:

def fetch_and_filter(access_token, start_date, end_date, include_pending=False):
    """Fetch transactions, optionally filtering out pending."""
    txns = get_transactions(access_token, start_date, end_date)
    
    if not include_pending:
        txns = [t for t in txns if not t.pending]
    
    return txns

I’d recommend NOT importing pending transactions and letting them settle first.

Great guide! On question #3 about handling the Link flow in a CLI/headless environment - this is actually one of the trickiest parts.

CLI-Friendly Link Flow Options

Option 1: Local Flask Server (What I Use)

I run a minimal local web server just for the Link flow:

from flask import Flask, request, render_template_string
import webbrowser

app = Flask(__name__)

LINK_TEMPLATE = '''
<!DOCTYPE html>
<html>
<head>
    <script src="https://cdn.plaid.com/link/v2/stable/link-initialize.js"></script>
</head>
<body>
    <script>
        var handler = Plaid.create({
            token: '{{ link_token }}',
            onSuccess: function(public_token, metadata) {
                fetch('/callback', {
                    method: 'POST',
                    headers: {'Content-Type': 'application/json'},
                    body: JSON.stringify({public_token: public_token})
                }).then(() => {
                    document.body.innerHTML = '<h1>Success! You can close this window.</h1>';
                });
            },
            onExit: function(err, metadata) {
                document.body.innerHTML = '<h1>Link cancelled. Close this window.</h1>';
            }
        });
        handler.open();
    </script>
</body>
</html>
'''

@app.route('/')
def link_page():
    return render_template_string(LINK_TEMPLATE, link_token=current_link_token)

@app.route('/callback', methods=['POST'])
def callback():
    global received_public_token
    received_public_token = request.json['public_token']
    return 'OK'

def run_link_flow(link_token):
    global current_link_token
    current_link_token = link_token
    
    # Open browser to local server
    webbrowser.open('http://localhost:5000')
    
    # Run server until we get the token
    app.run(port=5000)

It’s not pretty, but it works and keeps everything local.

Option 2: Plaid’s Own Quickstart

Plaid has a quickstart app you can run locally. Clone their repo and run it once to link accounts, then use the access tokens in your pipeline.

Option 3: One-Time Manual Setup

For personal use, I’ve also just done the Link flow once through Plaid’s dashboard, saved the access token securely, and then run my CLI scripts against that. Not scalable, but works for individual accounts.

A quick note on the security section - this is really important and I want to emphasize it further:

Record Keeping for API Credentials

If you’re using this for business purposes, your Plaid credentials become part of your financial record-keeping system. Consider:

  1. Document your setup: Keep notes on when you created the Plaid account, what access tokens are connected to which accounts, and when you last verified the connections work.

  2. Backup strategy: If you lose your access tokens, you’ll need to re-link all accounts. Keep encrypted backups of your configuration.

  3. Succession planning: If something happens to you, can someone else access the business’s financial data? Consider documenting the setup for a trusted partner or accountant.

Security Best Practices Expanded

# Use a dedicated secrets file, not environment variables alone
import yaml
from pathlib import Path

def load_secrets():
    secrets_file = Path.home() / '.config' / 'plaid-beancount' / 'secrets.yaml'
    
    # Ensure file has restrictive permissions
    if secrets_file.exists():
        mode = secrets_file.stat().st_mode & 0o777
        if mode != 0o600:
            raise PermissionError(
                f"Secrets file has insecure permissions: {oct(mode)}. "
                f"Run: chmod 600 {secrets_file}"
            )
    
    with open(secrets_file) as f:
        return yaml.safe_load(f)

Also, remember that access tokens should be rotated periodically. Plaid doesn’t force this, but it’s good security hygiene. Re-link accounts annually at minimum.

This is the kind of thing that’s not exciting but will save you major headaches down the road.