Just Say No!

I read Rick Spencer: Can we count users without uniquely identifying them? with sadness but not a lot of shock (this got mentioned on IRC a day or two ago).

If Canonical’s commercial customers want to count their user-base, that’s between Canonical and those customers. I do not think this kind of functionality has any place in a free software product. I do not think this should be in the Ubuntu repository or on the Ubuntu project ISO images.

44 Responses to “Just Say No!”


  1. 2 wolfger August 10, 2010 at 20:32

    What possible rational reason can you have for not wanting it in the repository?

  2. 3 wolfger August 10, 2010 at 20:33

    Semantical correction: “for wanting it excluded from the repository.”

    Not wanting it is fine. Wanting it excluded so that nobody else has easy access to it has no justification.

  3. 4 Dieki August 10, 2010 at 20:36

    Why not? I see no way in which it violate the user’s rights, or even the user’s privacy.(Neither of which technically matter; all that really matters is whether or not it violates the license in use, which it undeniably does not.)

  4. 5 Andrew Mitchell August 10, 2010 at 20:57

    As it stands at the moment, it’s only in the *partner* repository, which isn’t exactly part of Ubuntu, and solely from the descriptions of it so far, there appear to be no plans to put it on the ISO images.

  5. 6 skitterman August 10, 2010 at 20:58

    Not all things that are legal are appropriate. I don’t think user tracking that’s not opt-in is appropriate for Ubuntu. If people want to be tracked, they can enable popcon. Extending that to provide less detail for people that only want to provide it would be reasonable.

    As for why not in the archive: There’s no rationale for this unless it’s installed by default to count total user base. For an optional tool to count opt-in users we already have popcon. There’s no need for another optional tool.

  6. 7 skitterman August 10, 2010 at 21:02

    @ajmitch: I don’t have a problem with it being provided for OEMs. That’s Canonical’s issue and nothing to do with Ubuntu per se. In the last paragraph though, Rick specifically throws out the idea of including it on Ubuntu ISOs. That’s what I’m reacting to.

  7. 8 Dieki August 10, 2010 at 21:42

    @skitterman It’s not user tracking. It’s user counting. The difference between the two is vast – the first is a privacy invasion in some circumstances, but the second is normal and happens in a large number of products without you even noticing. (Including Firefox, which Ubuntu already ships on the CD)

  8. 9 Dieki August 10, 2010 at 21:44

    @skitterman You say that non-opt-in user counting is inappropriate for Ubuntu. Why not? Please explain.

  9. 10 ethana2 August 10, 2010 at 22:43

    Seriously? Why not just fix the Epiphany, Chrome, Midori, and Firefox alpha/beta useragent strings? Was making a separate census package really easier?

  10. 11 skitterman August 11, 2010 at 01:58

    Collecting data without user consent is not something I consider appropriate. The fact that the data is at least nominally anonymous (in many cases it could be combined with connect IP data to make it useful for tracking) makes it less bad than if it were not, but it doesn’t make it OK in my opinion.

    Of the apps mentioned, only Firefox is shipped by default (Chrome is not even in the Ubuntu repositories) and given the conditions under which Canonical has agreed to ship Firefox it’s not something Ubuntu can control. It would be my preference that Ubuntu ship not ship software that it is not free to modify (which is the case for Firefox when shipped as such).

    There is no technical need for this and so I don’t think it’s up to me to defend excluding it. Collecting data without consent is not something I find consistent with what I hope are the values of the project.

  11. 12 stefan August 11, 2010 at 02:07

    we already have http://popcon.ubuntu.com/. why are you not going mad about this? why does it have a place in the free software?
    the same for http://popcon.debian.org/.
    there is more data transmitted in these contests than in the canonical-census.

  12. 13 skitterman August 11, 2010 at 02:15

    @stefan: Did you read any of the previous comments before posting? I already mentioned popcon. Popcon is completely opt-in. I have no problem with that.

  13. 14 port August 11, 2010 at 06:41

    Scott,

    Thanks for this, I was totally unaware Canonical had this planned.

    If Canonical wants to do this with their OEM customers that’s there buisness, however if this makes it onto the Ubuntu ISO then I will dump Ubuntu, go back to Debian, and “NEVER* buy a computer with Ubuntu pre-installed.

    I heard someone say that “Ubuntu is nothing more than the Microsoft of Open Source Software”. I didn’t believe it at first, but it seems the statement isn’t to far off the mark. How sad

    • 15 nnonix August 11, 2010 at 10:35

      ‘I heard someone say that “Ubuntu is nothing more than the Microsoft of Open Source Software”. I didn’t believe it at first, but it seems the statement isn’t to far off the mark.’

      I can’t say enough how ridiculous the above comment is without directly insulting the author. Personally I think he needs insulting but I’m trying to be nice. My god.

  14. 16 Dieki August 11, 2010 at 07:20

    You didn’t answer the question. You said “Collecting data without user consent is not something I consider appropriate.”. I asked _why_ you didn’t think it appropriate.

  15. 17 Simon August 11, 2010 at 09:06

    I am not a commercial customer (I am part of a loco team instead) and I want to have statistics of Ubuntu usage for my country.

    Can I haz per-country or per-language reliable stats for Ubuntu installations?

  16. 18 Simon August 11, 2010 at 09:16

    I tried to extract per-country or per-language stats for Ubuntu installation using the popcon data.

    I could not make it. There is no chance for per-country (no IP info). For per-language stats, you could count the number of installations of the langpack package. However, you do not know how many have opted in for popcon. You can only draw relative conclusions on how the langpack popularity changes.

    This situation does not help if I want to talk to people and tell them that, look, there are XYZ number of Ubuntu installations in the country, we should do this and that.

  17. 19 Martin Owens August 11, 2010 at 09:37

    If you don’t mind it being opt-in, then technically speaking should you have a problem with the package being in universe or even installed but not enabled on the cd?

    It’s no stretch of the imagination to have such a checkbox in ubiquity. It is open source and we can see what data is being sent. none of it’s not user data apart from the ip-address which I would hope is thrown away.

    More importantly are the servers it’s contacting running Free Software? Do we know how it’s counted?

  18. 20 skitterman August 11, 2010 at 10:07

    @Simon: I could see extending popcon to collect more information, such as per country. It still would not be useful to get total installed base numbers though. To get that it has to be in by default and difficult to disable, exactly what I object too.

    @Martin: Adding another opt-in mechanism for users to be counted is pointless. If it’s to be opt-in, we should just extend popcon as needed. I do object to pointless software in the Ubuntu archive and ask to have it removed when I find it (I’m not under any illusions that I’ve been fully successful in this regard).

    Even if the servers are running Free software, that doesn’t help with determining how the data is used. Once the data has been copied from the server, there’s no telling what would be done with it. That doesn’t mean I think Canonical would do something nefarious, just that there’s no way to tell.

    @Dieki: I said I thought it was inconsistent with the values of the project. While I don’t expect everyone to agree with that, I think that answers the question.

    @port: This isn’t something Canonical is doing in secret. It was mentioned as something that the community might consider (which I’ve now done). I think labeling them “Microsoft of Open Source Software” on that basis is really over the top.

  19. 21 Stoffe August 11, 2010 at 10:54

    Why would anyone use IP for this, it’s error-prone in so many ways. Either you are just counting households, or you are counting internal network IPs – and you are counting every moving laptop multiple times. And so on. IPs change. They are shared.

    Not totally proof, but how about a strong one-way hash of the MAC (and/or some other semi-unique data)? Not identifiable or traceable in any meaningful way, but still should be a good way to count unique installations of a distribution. On top of that, use GeoIP to store the connecting IPs *country* only, and throw away IP. Also not 100% correct, but should be close enough to get location based stats. How to handle people travelling with the same computer is a question of taste, and should not really affect overall stats anyway.

    • 22 skitterman August 11, 2010 at 11:06

      IP address in combination with the per-computer information proposed to be collected would track back to either individuals or a small number of users except in large installations. It’s not the perfect tracking method, but I mention it to point out that even if the information proposed to be collected is not per-user, it wouldn’t be very difficult to combine it with other information to get close to that.

  20. 23 kklimonda August 11, 2010 at 11:57

    Isn’t it all about the trust? You either trust Canonical (and its privacy policy – I assume that there would be one for this service) or you don’t. Sure, there are some things I wouldn’t give away to anyone for anything under any assurance but IP? Come on, I wouldn’t be able to browse internet if I were afraid of people tracking my location by IP… and what can Canonical do with this data anyway? Or is this just a matter of principles?
    The problem with anything being opt-in (just like popcon) is that no one but people consciously interested are going to enable it. As Canonical is trying to gain users who are not really interested in their system of choice, floss politics etc, the usage of opt-in mechanism is going to make it even harder to estimate the number of installations. Why not display, as the last installation step, the Privacy Policy with the checkbox (already checked) that people can click to opt-out of the census? If they really care about their privacy they will uncheck it and go their way.

    • 24 skitterman August 11, 2010 at 12:06

      The point about combining the proposed data collection with IP address is to answer the claim that trust isn’t necessary since the data can’t be used to track individual users. That is true for the given data set, but in combination with IP address, it’s not entirely true. So I agree, it’s about trust.

      For me it is a point of principles. I don’t think collecting user data without consent (opt-out does not constitute proper consent in my opinion even if made visible) is not consistent with what I believe the project’s values to be.

      I recognize that this only works if it’s mandatory. That’s why I think Ubuntu should have nothing to do with it.

  21. 26 Mer August 11, 2010 at 13:35

    Wait, so you’re saying that Canonical has no right to ask the user how many times that user has asked for updates from _Canonical_? That’s far less of a privacy violation than the most benign of cookies. In fact almost all websites keep track of how many unique visitors they have. The only other piece of information that this article discusses is computer model- only used by the OEM that sold you Ubuntu preinstalled. You cannot be tracked at all with this system because you’re IP address is always changing (dynamic IP) and there will be hundreds of thousands of people who have the same number of downloads.

    Please explain exactly how Canonical could use this information to find out more than just “number of active users” and why these extra pieces of information might be an invasion of privacy.

    PS: I hope you don’t use email or any other login service, because they know EXACTLY where and when you look for updates. Not to mention your personal conversations….

    • 27 skitterman August 11, 2010 at 13:57

      What is confusing about “I don’t think collecting user data without consent is not consistent with what I believe the project’s values to be.”?

      P.S. For my email, no. They don’t. For others, commercial projects have different values than Ubuntu. I don’t expect them to behave in ways that are consistent with Ubuntu values. I do expect it from Ubuntu.

  22. 28 Marie August 11, 2010 at 14:14

    Canonical OS is getting scarier by the minute. Tracking, new paid app store coming, proprietary ram hogging Ubuntu One and music store, promoting Microsoft Mono….hmmmmm….time to go back to Debian?

  23. 29 question August 11, 2010 at 14:43

    What possible rational reason can you have for wanting it excluded from the repository?

    • 30 skitterman August 11, 2010 at 15:45

      Please read the existing comments. If it’s not on by default, it serves no real purpose. Extending popcon to collect additional information if people opt in is a better solution than a new tool. I’m against having things that serve no real purpose in the archive.

  24. 31 Espen77 August 11, 2010 at 14:54

    I dont mind it being in the rpo or even on the ISO if you can choose to activate and deactivate this function easily both in the installer and later, and that it is not active by default (like the apt package popularity info).

    It could be a good tool if combined with the bugs filed in launchpad. It can provide people with a good sugestion of what computer to get if they want to run ubuntu (many users and few bugs), and the bug squishers with an idea about what bugs to fix to have a big impact for the most people.

  25. 32 Scott Ritchie August 11, 2010 at 21:31

    Can’t we just count users some other way?

    I mean it’s kinda frustrating not knowing whether we have 8 million or 24 million users but Canonical hasn’t been open at all about the way they derive their estimates. From what I can tell, though, that’s because they don’t really have good estimates at all.

    But this is a solvable problem, without needing a package everywhere.

    For example, if Canonical told us how many times the mirrors get a request for packages.tar.gz per day, we’d get very close to what this package is trying to accomplish: instead of sending out a special message, we’d instead just measure the apt-get update that automatically happens.

    This would double count manual apt-get updates, and undercount people who didn’t go online that day, but I think those two might roughly cancel eachother out. It certainly wouldn’t be off by 2 orders of magnitude like the current estimates.

  26. 33 Matthew Jones August 12, 2010 at 13:28

    You need to read the post by Rick Spencer more carefully.

    They do not record the ip address. There is no way for the model name or counter to be connected to the ip address:

    “Notice that neither foo or the number 1 are unique data. Any number of computers will be reporting the exact same model name and increment number. When the server sees a 1 come in, it finds the first counter at 0 and increments that counter to 1”

    The entire method of counting this way, was devised to not rely on ip addresses, because of NAT and tracking issues.

    What is with all the canonical hate? Most of it is misinformation like this.

    • 34 skitterman August 12, 2010 at 13:36

      Connect IP address is routinely logged, so even if this application doesn’t collect it, it is available.

      Whether they use it to de-anonymize the information or not is irrelevant to me. I will say (again) that I think sending any information not technically required without active user consent is not consistent with what I hope are the values of the Ubuntu project.

      Please point out any factual errors in the post and I will correct them.

  27. 35 Matthew Jones August 12, 2010 at 13:56

    De-anonymize the information. Are you kidding me?

    There is nothing your machine is sending to the census server, that it isn’t already sending to the repos when it pulls down updates. Every time you update Fedora, Ubuntu, Et cetera they get your ip address and distro version.

    All this does is get a count of the installs without tying them to ip addresses. Right now Fedora records the number of repo requests for each ip address. This is the same as what Fedora does, but gets accurate numbers, and does not connect anything to ip addresses. This is less bad than what Fedora is already doing.

    Why is it that no one complains about Fedora tracking. But everyone is assuming that canonical is going to do evil with this?

    • 36 skitterman August 12, 2010 at 14:03

      If I ran Fedora, I’d complain at that too.

      Where did I say Canonical would do evil? I just said I didn’t think collecting data without consent was appropriate. The information passed when checking for updates is technically required. This is not and that makes it different in my opinion.

    • 37 Jef Spaleta August 12, 2010 at 14:42

      Let’s clarify. Right now Fedora is using the same httpd logging mechanisms that a package like awstats uses to do http log analysis. Fedora does its counting in-part based on the number of unique ip addresses that request the dynamically generated mirrorlist from the MirrorManager service for their release version and architecture as part of a default install. Knowing the client ip address is a vital part of the MirrorManager service. Based on ip address, a tailored mirrorlist (potentially tailored by the network admin of your institutional network if they are signed up with MirrorManager) is handed back to the user, hopefully conserving institutional external bandwidth consumption and pointing users to fast mirrors in their local network segment when available without any reconfiguration on the users part..which is very good for roaming systems like laptops or liveusb keys with persistent overlay. Fedora doesn’t log the ip address for the express purpose of mining the data, its a necessary piece of information to make the MirrorManager service work. I would dare say that because the _intention_ is to provide a valuable user service in MirrorManager, the mining of the ip address information as a secondary objective is less problematic.

      That being said…. you need to realize that every single web service on the planet is logging ip addresses. You have to assume that every single webserver in operation that you connect to is creating the standard logs which encode the ip addresses and the urls being accessed. I would dare say that most http mirror operators are generating the very same logs. Users who choose to bypass MirrorManager completely and reconfigure to use external mirrors of their choosing..they still get their ip addresses logged somewhere. Any time you make a network connection to a http server on the net, you must assume that your ip address and the url you are requesting is being logged by the operator of that service. You have no way to confirm that it is not, nor is there any legal requirement for any of them to tell you that they are doing ip address connection logging. And even if there were, it would be impossible for you to know that without going to their site to read their policy and thus be logged in the process.

      -jef

  28. 38 Matthew Jones August 12, 2010 at 14:33

    I didn’t say you said they were evil. I said you were “assuming that canonical is going to do evil”:

    “Even if the servers are running Free software, that doesn’t help with determining how the data is used. Once the data has been copied from the server, there’s no telling what would be done with it. That doesn’t mean I think Canonical would do something nefarious, just that there’s no way to tell.”

    Here you are assuming that because the potential for tracking abuse exist, it should not be used. You could say that about anything. Including wordpress, any isp, or connecting to the repos.

    That is an unproductive knee jerk reaction to have. Especially when the whole new method for census is based around it not being privacy violating.

    Would you be happy if canonical signed a promise not to record ip addresses on the census server? Without recording the ip addresses with the census, there would be no way to link the data to an ip address.

    • 39 skitterman August 12, 2010 at 14:48

      No.

      I will say (again): I don’t think collecting data from users that is not technically required without consent is appropriate.

      What they will or will not do or how much I trust or don’t trust them is irrelevant.

      I realize that leaves open the question of how many Ubuntu users there are, but I think that’s OK.

  29. 40 Matthew Jones August 12, 2010 at 14:54

    We’ll just have to agree to disagree then.

  30. 41 Mircea August 15, 2010 at 04:43

    Do you use google? They log your IP address as well as the search terms AND they don’t ask you to give your consent.

    I hope you don’t use google, because that would just make you a hypocrite.

    • 42 skitterman August 15, 2010 at 16:40

      You might want to go study the meaning of that word. It’s not related to thinking the free software projects should have different standards than proprietary services.

      • 43 Mircea August 16, 2010 at 05:40

        Main Entry: hyp·o·crite
        Pronunciation: \ˈhi-pə-ˌkrit\
        Function: noun
        1 : a person who puts on a false appearance of virtue or religion
        2 : a person who acts in contradiction to his or her stated beliefs or feelings
        (Merriam-Webster)

        My intention is not to insult you. I’m simply pointing out how you might seem to some, because hypocrites are not usually taken seriously when stating views.

        I get where you’re coming from and I don’t disagree. However, while I consider freedom a goal definitely worth pursuing, I’m not sure if we should make it _the_ ultimate goal. Perhaps we might consider helping others a worthwhile goal, too. And this goal might mean we need so sacrifice a bit of freedom for some limited time. I think that by spreading free (as in beer) and libre (as in speech) software we do a lot of good to people who would not otherwise be able to afford to buy software. It might mean that for the time being we need to prove to OEMs that it’s worth to them supporting the free/libre software we advocate.

        We should always keep the freedom goal in view and never give up on it. I certainly don’t wish we end up in the same situation like the US citizens who lost their freedom bit by bit.


Leave a reply to skitterman Cancel reply