Precinct summability of IRV
@last19digitsofpi Hey 8451058495548583234,
I'm not entirely following you.... I think you are speaking of a different thing. You are comparing it to Approval voting, which I will admit allows for much more straightforward precinct summing as well as hand tabulation.
I'm not sure what you mean by "sort all ballots into piles". This might be a hand tabulation step, but then again it seems to me that you shouldn't have to actually physically sort them, since now they have to be stored separately, etc. Doesn't work so well when there are hundreds of thousands of ballots at a hand recount.
I'm mostly talking about how the precincts can, during and immediately following an election, send all necessary data so that we can know who is winning, and how close the other candidates are, without a long delay as is typical currently with ITV.
Notice that some ranked ballot elections could have precincts submit pairwise matrices (numCandidates * (numCandidates-1) numerical values ), and that counts as "all necessary data."
IRV requires more data than that, but not THAT much more. Little enough data that, even with a 10s of millions of ballots, is small enough to be able to paste the textual data into an email or forum post in a readable format.
Note that Bottom-2-IRV still requires all the ballots to fully tabulate, but because it is Condorcet compliant, the pairwise matrix data goes a lot further. Other methods (Min-Max for instance) can resolve it fully with a matrix.
IRV requires more data than that, but not THAT much more. Little enough data that, even with a 10s of millions of ballots, is small enough to be able to paste the textual data into an email or forum post in a readable format
While I am certainly in agreement with you in spirit, in all fairness to the concern, there are some rare cases where the raw ballot data is legitimately huge. Try checking out some of the statewide AU Legislative Council elections (conducted using STV). I believe it was multiple GB
@andy-dienes Multiple gigs? Holy crap.
I was specifically talking about single winner elections with basic ranked ballots. The election I show above is only 9000 ballots, but the data doesn't grow proportionately to the number of ballots, since it just increments a number when a ballot is added (if a ballot with an identical ranking has already been added).
It varies mostly based on the number of candidates. So if there are hundreds of candidates, sure, that's a lot of data. Still I can't imagine it in the gigabytes, unless something else entirely is going on.
If you've got such data, I would love to see it. You are speaking of literally a million times as much data as the burlington election, so I'm at a bit of a loss as to how that could happen.
@andy-dienes It appears they went out of their way to make that about as inefficiently stored as they could.
If you give each candidate a very short abbreviation ("a" "b" and "c" work great if you have less than 26 candidates), show a whole rank ordering per line (e.g. "b>c>g>f>a"), and then combine all rank orderings into a single line preceded by a count (e.g. "4238: b>c>g>f>a"), you should be able to reduce it massively.
Have you parsed any of these giant files? I'd like to see how small we could get them.
@rob I think the main blowup of memory is their use of string dtypes everywhere; parsed in Python, a string is something on the order of magnitude of 60 bytes. The information it contains which is relevant to the actual election (if you are more clever about symbology) can usually fit in something more like... 2 bytes. I bet if we picked a more efficient format we could get the whole thing uncompressed to about 20 or 30 mb
@andy-dienes said in Precinct summability of IRV:
I bet if we picked a more efficient format we could get the whole thing uncompressed to about 20 or 30 mb
Unless it represents something entirely different than what I'm imagining, I think we could get it down to 20 or 30 kilobytes.
@rob Well, I don't think you can bucket purely based on preference order, since that csv also contains information regarding metadata about the ballot. But probably you are still right and I am overestimating---I suppose we'll have to just try and find out
rob Banned last edited by rob
@andy-dienes yeah I don't want any of that metadata since I am talking about a replacement for the precinct sums that come from, for instance, a choose-one election.
In those, all we need is a count for each candidate.
Here, we need a count of each "ballot configuration" that has at least one ballot, but we don't need anything else. (and I guess the number of possible configurations is factorial of the number of candidates, so it can get large as the number of candidates gets larger)
There is a place for all that extra data, but I just want each precinct to be able to send enough data that we can do the full tabulation, and no more. In IRV elections, they often say "if there is no candidate that has more than 50% of the first place votes, you have to wait for a week or more for us to tell you who the winner is."
Which I might be able to understand if they were sending in the results attached to the leg of a pigeon.
A pair of concepts that might have use when we think about cast ballots is "ballot token" and "ballot type". A ballot token then would be an individual ballot and a ballot type would be the equivalence class of all the ballots that say the same thing.