Hi,
So I own small-to-medium sized community with a lot of content (~2.5 mil. posts).
My database hit the hardware limit and I cannot afford to vertically scale any longer.
What I am experiencing are frequent lockups and general performance degradation at this point. No database optimisation helps, it just needs more resources.
I decided to remove content for some users that posted a lot of trash and I know how to do it from the interface.
The problem is that IPS processes tasks in batches, so background task is extremely slow and deleting content for the single user takes ages, even when I deploy the community on my desktop with 16 cores and 64GB RAM.
What I would like to do is to use raw SQL or write my own script in Go to clean this content up, but I'm worried that I'll miss some relations.
So far I think that removing topics involves following tables:
forums_posts
forums_topics
forums_archive_posts
core_attachments
core_social_promote
core_reputation_index
core_follow
I do not update core_search_index because I prefer to rebuild the index afterwards,
But is that all? I know how to craft the query and how to target the posts, but I'm concerned of adverse effects I might experience later?
This is query set I run on dev version when I sync database with production:
DELETE FROM core_members WHERE member_id NOT in(1,1914,1476,1911,1642,1650);
DELETE FROM forums_posts WHERE author_id NOT in(1,1914,1476,1911,1642,1650);
DELETE FROM forums_topics WHERE starter_id NOT in(1,1914,1476,1911,1642,1650);
DELETE FROM archive_posts WHERE archive_author_id NOT in(1,1914,1476,1911,1642,1650);
DELETE FROM core_members_known_devices WHERE member_id NOT in(1,1914,1476,1911,1642,1650);
DELETE FROM core_members_known_ip_addresses WHERE member_id NOT in(1,1914,1476,1911,1642,1650);
DELETE FROM core_member_history WHERE log_member NOT in(1,1914,1476,1911,1642,1650);
DELETE FROM core_follow WHERE follow_member_id NOT in(1,1914,1476,1911,1642,1650);
DELETE FROM core_item_markers WHERE item_member_id NOT in(1,1914,1476,1911,1642,1650);
TRUNCATE core_statistics;
-- DELETE FROM forums_topics WHERE state='link' AND moved_to LIKE CONCAT( '3654&', '%' )
This leaves forum quite clean with some content to test with. It does not leave forum broken.
I am basically trying to reverse engineer content deletion per member task. Code is quite messy and not fun to go through.
What do you guys think?
EDIT: I discovered that some tables like core_output_cache and core_log have grow extremely large. By truncating them, and recreating the database, everything is so much better now, that I might not even need to delete anything else.