Note: this no longer works as TikTok now blocks cross-origin media requests.
As part of a project centered around TikTok (coming soon!) I needed a way to get a video's media. The closest thing that TikTok officially provides is their oembed
API, but unfortunately this only includes basic information like title
, thumbnail_url
etc. And other unofficial APIs cost money or require python, which I don't want to do. So I needed another way to retrieve this info.
Leveraging SSR patterns 📦
As a website that handles large volumes of traffic I knew they had some sort of SSR/client-side hydration going on. This generally involves is shipping data to the client in an object inlined inside a <script>
tag, reducing the number of necessary client-side requests by directly populating components with this data. In Next.js for example, this happens via server-side props.
And lo and behold! There's a handy blob provided alongside the page markup for every video. You can confirm this for yourself by visiting any TikTok video's page (https://www.tiktok.com/:user/video/:id
) and looking at the contents of the SIGI_STATE
tag via console:
JSON.parse(document.getElementById("SIGI_STATE").innerText);
Knowing this I built a simple handler using deno to fetch this blob and return it directly.
Enter cheerio & deno 🥣🦕
If the video URL is attached to the request (via POST
body or params), a simple handler can fetch (scrape) the content for the page:
const response = await fetch(`https://tiktok.com/${video}`);
const html = await response.text();
Then, using cheerio, this raw text can be turned into a queryable jQuery-like API. This provides an entrypoint to get the content of the server-generated blob within the script tag:
const $ = cheerio.load(html);
const appContext = $("#SIGI_STATE").text();
From there, it's only a matter of parsing this as JSON and returning pertinent data in the response (ItemModule
). The data is currently keyed by some unique identifier, Object.keys
helps remove that:
const json = JSON.parse(appContext);
const key = Object.keys(json.ItemModule)[0];
const data = json.ItemModule[key];
return new Response(JSON.stringify({ ...data }), {
headers: { 'Content-Type': 'application/json' },
status: 200,
});
And that's it! Now I can make a client-side request to get up-to-date media for a video and display it directly. This also gets around TikTok's CDN expiration which would prevent saving a reference to this media.
Side-note: comparing this to the output for one of the unofficial API options I came across, the data looks identical. So it's likely they're doing the same thing (but charging almost 00/mo for unlimited usage)!
Anyways, this is fragile and could break if they decide to rename the script tag, rate-limit, etc. But this demonstrates how easy it is to spin up an instance for your own project. And maybe with enough bot traffic TikTok will create a public API that accomplishes this instead of making us scrape for it! 😇