User:Brooke Vibber/Dump build split
Current plan: split to four threads; one for enwiki, one for the next few largest wikis, one for a few dozen more medium-to-large, and a fourth for everything else.
This will allow spreading out the timing more, make better utilization of database servers, etc.
Currently going to run:
- thread 1 (enwiki) on srv31
- thread 2 (large) on benet
- thread 3 (medium) on srv31
- thread 4 (small) on benet
ZOMG
editHandy splitter tool
editAttempts to break up the database list into similar-sized chunks. Not totally succesful. ;)
<?php $total = 0; $counts = array(); $threads = 4; $fudge = 1.0; foreach( file("dbsizes.csv") as $line ) { list( $revs, $db ) = explode( "\t", trim( $line ) ); if( $db == "Database" ) continue; //echo "$db: $revs\n"; $counts[] = array( "db" => $db, "revs" => intval( $revs ) ); $total += intval( $revs ); } $perthread = intval( $total / $threads ); echo "Total: $total\n"; echo "Desired threads: $threads\n"; echo "Ideal count per thread: $perthread\n"; $assignments = array(); $dbindex = 0; for( $i = 0; $i < $threads; $i++ ) { $assignments[$i] = array(); $dbcount = 0; $revcount = 0; while( $revcount < $perthread * $fudge && $dbindex < count( $counts ) ) { $revcount += $counts[$dbindex]["revs"]; $assignments[$i][] = $counts[$dbindex]; $dbindex++; $dbcount++; } echo "Thread $i: $dbcount databases, $revcount revisions\n"; } foreach( $assignments as $i => $dbs ) { echo "\n# Thread $i\n"; usort( $dbs, 'sortDatabases' ); foreach( $dbs as $item ) { echo $item["db"] . "\n"; } } function sortDatabases( $a, $b ) { return strcmp( $a["db"], $b["db"] ); } ?>
Suggested splits from the tool
editTotal: 113956291 Desired threads: 4 Ideal count per thread: 28489072 Thread 0: 1 databases, 48078833 revisions Thread 1: 4 databases, 29382691 revisions Thread 2: 40 databases, 28577478 revisions Thread 3: 635 databases, 7917289 revisions
Thread 0
edit- enwiki
Thread 1
edit- dewiki
- frwiki
- nlwiki
- plwiki
Thread 2
edit- arwiki
- bgwiki
- bgwiktionary
- cawiki
- commonswiki
- cswiki
- dawiki
- dewiktionary
- enwikibooks
- enwikinews
- enwikiquote
- enwiktionary
- eowiki
- eswiki
- etwiki
- fiwiki
- frwiktionary
- hewiki
- hrwiki
- huwiki
- idwiki
- iowiktionary
- itwiki
- ltwiki
- metawiki
- nowiki
- plwiktionary
- ptwiki
- rowiki
- ruwiki
- sep11wiki
- skwiki
- slwiki
- sourceswiki
- srwiki
- svwiki
- trwiki
- ukwiki
- viwiki
- zhwiki
Thread 3
edit- everything else!