MCTF25{t4Lk_b1rdY_t0_m3}I found this dusty old AI translator device in a pawn shop, so I tried translating the flag, and it worked! Later I wanted to play with it some more, but totally forgot what the flag is. Now all I have is
/flag.wav, but maybe you can help me translate it back?
IP:10.240.2.50
Hint: AI audio translator //flag.wavon the box.
The goal: recover the original MCTF{...} flag from a WAV file that was generated by an “AI translator” web service.
First, scan the box to see what’s running:
nmap -sC -sV 10.240.2.50
Result (relevant part):
5000/tcp open http Werkzeug httpd 3.1.3 (Python 3.12.12)
|_http-title: AI Translator
|_http-server-header: Werkzeug/3.1.3 Python/3.12.12
So it’s a small Python web app (very likely Flask) on port 5000, serving the “AI Translator”.
Grab the WAV file mentioned in the description:
curl -v http://10.240.2.50:5000/flag.wav -o flag.wav
file flag.wav
Output:
flag.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
So /flag.wav is indeed a valid audio file.
Check the main page:
curl -s http://10.240.2.50:5000/ | head -n 80
Relevant part of the HTML/JS:
<form id="form">
<label for="inputText">Enter your text:</label>
<textarea id="text" required maxlength="100"></textarea>
<button type="submit">Translate</button>
...
</form>
<script>
document.getElementById('form').addEventListener('submit', async function(e) {
e.preventDefault();
const text = document.getElementById('text').value;
const resp = await fetch('/translate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text })
});
const blob = await resp.blob();
...
});
</script>
Key points:
POST /translateContent-Type: application/json{ "text": "<your text>" }flag.wav).So we cannot just POST audio and get text back. Instead, we need to reverse what the translator does.
On the VM, play or view the audio:
# One of these, depending on what’s installed
aplay flag.wav
# or
ffplay flag.wav
Subjective analysis:
You can also open flag.wav in Audacity, switch the track to Spectrogram view, and visually see the repeating beep “bars” that hint at a structured encoding.
With some analysis tooling (Python, FFT, etc.), you can observe:
The pattern is:
That gives:
(52 total beeps - 4 marker beeps) / 2 = 24 characters
So the flag encoded in flag.wav is 24 characters long.
The challenge text says /flag.wav was produced by translating the flag with the AI translator. Since we also have access to the live translator, we can compare its output with flag.wav.
First, generate a WAV from a guessed prefix:
curl -s -H "Content-Type: application/json" -d '{"text":"MCTF25{"}' http://10.240.2.50:5000/translate -o prefix.wav
Then open both flag.wav and prefix.wav in Audacity and switch their tracks to Spectrogram view. Visually comparing the spectrograms shows:
MCTF25{ in prefix.wav matches exactly the beginning of flag.wav.So we can confidently conclude that the flag starts with:
MCTF25{
This also confirms the encoding is deterministic and that each character is represented by a distinctive pair of beep frequencies. From there, we can use the translator as an oracle and brute-force the remaining characters by comparing beep pairs.
Instead of trying to figure out the full mapping from frequencies to characters, we can do this character by character:
MCTF25{.i in the flag:
i-1.i from a set of allowed characters (e.g. A–Z, a–z, 0–9, {, }, _).c:
<prefix_so_far><c>:
curl -s -H "Content-Type: application/json" -d '{"text":"<prefix><candidate>"}' http://10.240.2.50:5000/translate -o test.wav
i from the generated test.wav.i in flag.wav.c is the correct character at position i.}.This reduces the problem to pattern matching, not full-blown audio decoding.
We already know:
Index 0 1 2 3 4 5 6
Char M C T F 2 5 {
Using the oracle approach and comparing beep pairs (helped by inspecting spectrograms in Audacity and confirming that each character maps to a unique frequency pair), we brute-forced each position from index 7 onward until we got the closing }.
Step by step, the recovered characters form:
MCTF25{t4Lk_b1rdY_t0_m3}
This matches:
(52 - 4) / 2).MCTF25{...}.MCTF25{t4Lk_b1rdY_t0_m3}
Below are the key Python scripts used during analysis and solving. Some were exploratory (Morse/binary attempts), others were part of the final oracle-based solution.
decode_beeps.py – initial Morse-style attempt (dead end)#!/usr/bin/env python3
import wave
import struct
import sys
from collections import defaultdict
MORSE_TABLE = {
".-": "A", "-...": "B", "-.-.": "C", "-..": "D", ".": "E",
"..-.": "F", "--.": "G", "....": "H", "..": "I", ".---": "J",
"-.-": "K", ".-..": "L", "--": "M", "-.": "N", "---": "O",
".--.": "P", "--.-": "Q", ".-.": "R", "...": "S", "-": "T",
"..-": "U", "...-": "V", ".--": "W", "-..-": "X", "-.--": "Y",
"--..": "Z",
"-----": "0", ".----": "1", "..---": "2", "...--": "3", "....-": "4",
".....": "5", "-....": "6", "--...": "7", "---..": "8", "----.": "9",
}
def load_wav(path):
with wave.open(path, "rb") as wf:
if wf.getnchannels() != 1:
print("Warning: not mono, using first channel only")
n_frames = wf.getnframes()
frames = wf.readframes(n_frames)
sampwidth = wf.getsampwidth()
framerate = wf.getframerate()
if sampwidth != 2:
raise RuntimeError(f"Unsupported sample width: {sampwidth} bytes")
samples = struct.unpack("<" + "h" * (len(frames) // 2), frames)
return samples, framerate
def detect_beeps(samples, framerate, window_ms=10, threshold_factor=0.3):
window_size = int(framerate * window_ms / 1000)
if window_size <= 0:
window_size = 1
mags = []
for i in range(0, len(samples), window_size):
chunk = samples[i : i + window_size]
if not chunk:
break
avg = sum(abs(s) for s in chunk) / len(chunk)
mags.append(avg)
max_mag = max(mags) if mags else 1.0
threshold = max_mag * threshold_factor
bits = [1 if m >= threshold else 0 for m in mags]
return bits, window_size
def compress_runs(bits):
runs = []
if not bits:
return runs
cur = bits[0]
length = 1
for b in bits[1:]:
if b == cur:
length += 1
else:
runs.append((cur, length))
cur = b
length = 1
runs.append((cur, length))
return runs
def classify_morse(runs):
beep_lengths = [l for v, l in runs if v == 1]
gap_lengths = [l for v, l in runs if v == 0]
if not beep_lengths:
print("No beeps detected")
return ""
min_beep = min(beep_lengths)
min_gap = min(gap_lengths) if gap_lengths else min_beep
dot_dash_boundary = min_beep * 1.5
intra_letter_boundary = min_gap * 1.5
letter_gap_boundary = min_gap * 3.5
morse = []
current_symbol = ""
def flush_symbol():
nonlocal current_symbol
if current_symbol:
morse.append(current_symbol)
current_symbol = ""
for value, length in runs:
if value == 1:
if length <= dot_dash_boundary:
current_symbol += "."
else:
current_symbol += "-"
else:
if length <= intra_letter_boundary:
pass
elif length <= letter_gap_boundary:
flush_symbol()
else:
flush_symbol()
morse.append(" / ")
flush_symbol()
return " ".join(morse)
def morse_to_text(morse):
out = []
for token in morse.split(" "):
if token == "/":
out.append(" ")
elif token.strip() == "":
continue
else:
ch = MORSE_TABLE.get(token)
if ch:
out.append(ch)
else:
out.append("?")
return "".join(out)
def main():
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} flag.wav")
sys.exit(1)
path = sys.argv[1]
samples, framerate = load_wav(path)
bits, window_size = detect_beeps(samples, framerate)
runs = compress_runs(bits)
print(f"Detected {len(runs)} beep/silence runs")
morse = classify_morse(runs)
print("Morse guess:")
print(morse)
text = morse_to_text(morse)
print("Decoded text guess:")
print(text)
if __name__ == "__main__":
main()
This concluded it was not Morse (everything looked like EEEEEE...), pushing the analysis towards binary/pitch-based encoding.
analyze_beeps.py – binary run-length inspection#!/usr/bin/env python3
import wave, struct, sys
from math import gcd
def load_wav(path):
with wave.open(path, "rb") as wf:
if wf.getnchannels() != 1:
print("Warning: not mono, using first channel only")
n_frames = wf.getnframes()
frames = wf.readframes(n_frames)
sampwidth = wf.getsampwidth()
framerate = wf.getframerate()
if sampwidth != 2:
raise RuntimeError(f"Unsupported sample width: {sampwidth} bytes")
samples = struct.unpack("<" + "h" * (len(frames) // 2), frames)
return samples, framerate
def detect_beeps(samples, framerate, window_ms=5, threshold_factor=0.4):
window_size = int(framerate * window_ms / 1000)
if window_size <= 0:
window_size = 1
mags = []
for i in range(0, len(samples), window_size):
chunk = samples[i : i + window_size]
if not chunk:
break
avg = sum(abs(s) for s in chunk) / len(chunk)
mags.append(avg)
max_mag = max(mags) if mags else 1.0
threshold = max_mag * threshold_factor
bits = [1 if m >= threshold else 0 for m in mags]
return bits, window_size
def compress_runs(bits):
runs = []
if not bits:
return runs
cur = bits[0]
length = 1
for b in bits[1:]:
if b == cur:
length += 1
else:
runs.append((cur, length))
cur = b
length = 1
runs.append((cur, length))
return runs
def main():
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} flag.wav")
sys.exit(1)
samples, framerate = load_wav(sys.argv[1])
bits, ws = detect_beeps(samples, framerate)
runs = compress_runs(bits)
print(f"Window size: {ws} samples")
print(f"Total runs: {len(runs)}")
on_lengths = sorted(set(l for v, l in runs if v == 1))
off_lengths = sorted(set(l for v, l in runs if v == 0))
print("Unique ON lengths:", on_lengths)
print("Unique OFF lengths:", off_lengths)
g = 0
for _, l in runs:
g = gcd(g, l) if g else l
if g == 0:
print("No runs?")
return
print("GCD of run lengths:", g)
norm = [(v, l // g) for v, l in runs]
print("First 40 normalized runs (value, units):")
print(norm[:40])
bitstream = []
for v, units in norm:
bitstream.extend([str(v)] * units)
bitstring = "".join(bitstream)
print("First 160 bits of bitstream:")
print(bitstring[:160])
print("\nASCII attempt from bitstream (8 bits per char, from start):")
chars = []
for i in range(0, len(bitstring) - 7, 8):
b = bitstring[i:i+8]
val = int(b, 2)
if 32 <= val <= 126:
chars.append(chr(val))
else:
chars.append(".")
ascii_guess = "".join(chars)
print(ascii_guess[:80])
if __name__ == "__main__":
main()
This showed very regular ON/OFF lengths, hinting that timing wasn’t carrying the main information; instead the frequencies were.
decode_flag.py – frequency mapping via a training WAVThis version used NumPy + a training WAV (train.wav) generated by sending a known alphabet to /translate. It learned a mapping from beep frequency pairs → characters, then applied it to flag.wav.
#!/usr/bin/env python3
import wave, struct, sys
import numpy as np
# MUST match the text sent to /translate for train.wav
TRAIN_TEXT = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789{}_"
def load_wav(path):
with wave.open(path, "rb") as wf:
fr = wf.getframerate()
n = wf.getnframes()
ch = wf.getnchannels()
sw = wf.getsampwidth()
data = wf.readframes(n)
if sw != 2:
raise RuntimeError(f"Unsupported sample width: {sw}")
samples = np.frombuffer(data, dtype=np.int16).astype(np.float32) / 32768.0
if ch > 1:
samples = samples.reshape(-1, ch)[:, 0]
return samples, fr
def segment_beeps(samples, sr, smooth_ms=5, thresh_ms=0.3,
min_beep_ms=20, min_gap_ms=10):
win = int(sr * smooth_ms / 1000)
if win < 1:
win = 1
env = np.convolve(np.abs(samples), np.ones(win)/win, mode="same")
thr = env.max() * thresh_ms
mask = env > thr
beeps = []
n = len(mask)
i = 0
while i < n:
v = mask[i]
j = i + 1
while j < n and mask[j] == v:
j += 1
length = j - i
if v:
if length >= sr * min_beep_ms / 1000:
beeps.append((i, j))
i = j
return beeps
def beep_freqs(samples, sr, beeps):
freqs = []
for start, end in beeps:
length = end - start
n = min(length, 2048)
if n < 256:
continue
seg = samples[start:start+n]
N = 1
while N < n:
N *= 2
w = np.hanning(n)
seg_win = seg * w
spec = np.fft.rfft(seg_win, n=N)
mag = np.abs(spec)
mag[0] = 0
k = np.argmax(mag)
freq = k * sr / N
freqs.append(round(freq, 1))
return freqs
def pairs_from_freqs(freqs):
mid = freqs[2:-2]
assert len(mid) % 2 == 0
return [(mid[i*2], mid[i*2+1]) for i in range(len(mid)//2)]
def main():
flag_samples, sr1 = load_wav("flag.wav")
train_samples, sr2 = load_wav("train.wav")
if sr1 != sr2:
raise RuntimeError("Sample rates differ")
flag_beeps = segment_beeps(flag_samples, sr1)
train_beeps = segment_beeps(train_samples, sr2)
flag_freqs = beep_freqs(flag_samples, sr1, flag_beeps)
train_freqs = beep_freqs(train_samples, sr2, train_beeps)
print(f"Flag beeps: {len(flag_freqs)}, Train beeps: {len(train_freqs)}")
flag_pairs = pairs_from_freqs(flag_freqs)
train_pairs = pairs_from_freqs(train_freqs)
print(f"Flag pairs: {len(flag_pairs)}, Train pairs: {len(train_pairs)}")
if len(train_pairs) != len(TRAIN_TEXT):
print("Warning: TRAIN_TEXT length and train_pairs length differ!")
print(f"TRAIN_TEXT length: {len(TRAIN_TEXT)}")
pair_map = {}
for i, p in enumerate(train_pairs):
if i >= len(TRAIN_TEXT):
break
key = (round(p[0], 1), round(p[1], 1))
pair_map[key] = TRAIN_TEXT[i]
decoded = []
unknown = []
for p in flag_pairs:
key = (round(p[0], 1), round(p[1], 1))
ch = pair_map.get(key, "?")
decoded.append(ch)
if ch == "?":
unknown.append(key)
decoded_text = "".join(decoded)
print("Decoded flag guess:")
print(decoded_text)
if unknown:
print("Unknown pairs (no mapping in training):")
for u in sorted(set(unknown)):
print(u)
if __name__ == "__main__":
main()
This was useful to inspect how many pairs matched / didn’t match, but the final solve used a more direct oracle approach.
analyze_flag.py – extracting frequency pairs from flag.wavThis script extracts all character beep pairs from flag.wav and saves them to flag_pairs.txt:
#!/usr/bin/env python3
import wave, struct, numpy as np
SR = 44100
def load_wav(path):
with wave.open(path, "rb") as wf:
fr = wf.getframerate()
n = wf.getnframes()
ch = wf.getnchannels()
sw = wf.getsampwidth()
data = wf.readframes(n)
assert sw == 2
samples = np.frombuffer(data, dtype="<i2").astype(np.float32) / 32768.0
if ch > 1:
samples = samples.reshape(-1, ch)[:, 0]
return samples, fr
def segment_beeps(samples, sr, win_ms=10, thresh_ratio=0.2):
win = int(sr * win_ms / 1000)
if win < 1:
win = 1
env = np.convolve(np.abs(samples), np.ones(win) / win, mode="same")
thr = env.max() * thresh_ratio
mask = env > thr
segs = []
n = len(mask)
i = 0
while i < n:
if not mask[i]:
i += 1
continue
start = i
while i < n and mask[i]:
i += 1
segs.append((start, i))
return segs
def dominant_freq(samples, sr, start, end):
seg = samples[start:end]
n = len(seg)
if n < 200:
return None
win = np.hanning(n)
segw = seg * win
N = 1
while N < n:
N *= 2
spec = np.fft.rfft(segw, n=N)
mag = np.abs(spec)
mag[0] = 0
k = np.argmax(mag)
return round(k * sr / N, 1)
def main():
samples, sr = load_wav("flag.wav")
segs = segment_beeps(samples, sr)
freqs = [dominant_freq(samples, sr, s, e) for s, e in segs]
freqs = [f for f in freqs if f is not None]
print(f"Total beep freqs: {len(freqs)}")
start_pair = tuple(freqs[:2])
end_pair = tuple(freqs[-2:])
mid = freqs[2:-2]
pairs = [tuple(mid[i*2:(i+1)*2]) for i in range(len(mid)//2)]
print("Start marker:", start_pair)
print("End marker: ", end_pair)
print("Char pairs (index: (f1, f2)):")
for i, p in enumerate(pairs):
print(i, p)
with open("flag_pairs.txt", "w") as f:
for p in pairs:
f.write(f"{p[0]} {p[1]}\n")
if __name__ == "__main__":
main()
test_char.py – oracle-based brute force of each characterThis script is the core of the final solution: for a given known prefix and target index, it brute-forces the next character by comparing beep pairs to flag.wav.
#!/usr/bin/env python3
import wave, struct, numpy as np
import json, sys, subprocess
SR = 44100
def load_wav(path):
with wave.open(path, "rb") as wf:
fr = wf.getframerate()
n = wf.getnframes()
ch = wf.getnchannels()
sw = wf.getsampwidth()
data = wf.readframes(n)
assert sw == 2
samples = np.frombuffer(data, dtype="<i2").astype(np.float32) / 32768.0
if ch > 1:
samples = samples.reshape(-1, ch)[:, 0]
return samples, fr
def segment_beeps(samples, sr, win_ms=10, thresh_ratio=0.2):
win = int(sr * win_ms / 1000)
if win < 1:
win = 1
env = np.convolve(np.abs(samples), np.ones(win) / win, mode="same")
thr = env.max() * thresh_ratio
mask = env > thr
segs = []
n = len(mask)
i = 0
while i < n:
if not mask[i]:
i += 1
continue
start = i
while i < n and mask[i]:
i += 1
segs.append((start, i))
return segs
def dominant_freq(samples, sr, start, end):
seg = samples[start:end]
n = len(seg)
if n < 200:
return None
win = np.hanning(n)
segw = seg * win
N = 1
while N < n:
N *= 2
spec = np.fft.rfft(segw, n=N)
mag = np.abs(spec)
mag[0] = 0
k = np.argmax(mag)
return round(k * sr / N, 1)
def get_pairs(path):
samples, sr = load_wav(path)
segs = segment_beeps(samples, sr)
freqs = [dominant_freq(samples, sr, s, e) for s, e in segs]
freqs = [f for f in freqs if f is not None]
mid = freqs[2:-2]
pairs = [tuple(mid[i*2:(i+1)*2]) for i in range(len(mid)//2)]
return pairs
def main():
if len(sys.argv) != 3:
print(f"Usage: {sys.argv[0]} <known_prefix> <position_index>")
sys.exit(1)
prefix = sys.argv[1]
pos = int(sys.argv[2])
flag_pairs = []
with open("flag_pairs.txt") as f:
for line in f:
a, b = line.strip().split()
flag_pairs.append((float(a), float(b)))
target_pair = flag_pairs[pos]
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789{}_"
for ch in alphabet:
text = prefix + ch
print(f"Trying {ch} ...", end="", flush=True)
payload = json.dumps({"text": text})
cmd = [
"curl", "-s", "-H", "Content-Type: application/json",
"-d", payload, "http://10.240.2.50:5000/translate",
"-o", "test.wav"
]
subprocess.run(cmd, check=True)
pairs = get_pairs("test.wav")
if len(pairs) <= pos:
print(" (too short)")
continue
if pairs[pos] == target_pair:
print(" MATCH")
print(f"Found char at position {pos}: {ch}")
return
else:
print(" no")
print("No match found in alphabet")
if __name__ == "__main__":
main()
Usage example:
# After generating flag_pairs.txt with analyze_flag.py
# and knowing the prefix "MCTF25{"
python3 test_char.py MCTF25{ 7 # find char at index 7
python3 test_char.py MCTF25{t 8 # then with updated prefix, etc.
Repeating this for each position eventually yielded the full flag:
MCTF25{t4Lk_b1rdY_t0_m3}