Just want to cover the recent steps I took in learning about how to get Geo Location information for OpenAttribution. The first step I took was to look into two existing open source analytics platforms that are alternatives to Google Analytics: Umami and Matomo.
Umami
I love using umami.is! It’s definitely the smaller of the two projects, but something about it’s look and smaller size spoke to me, hence it’s what I use on this site for my own self hosted tracking.
Looking on GitHub we can see Umami builds their geo database here:
https://github.com/umami-software/umami/blob/d76603b5b72b56d73bbd15e66042ebb5de629163/scripts/build-geo.js#L31
let url = `https://raw.githubusercontent.com/GitSquared/node-geolite2-redist/master/redist/${db}.tar.gz`; if (process.env.MAXMIND_LICENSE_KEY) { url = `https://download.maxmind.com/app/geoip_download` + `?edition_id=${db}&license_key=${process.env.MAXMIND_LICENSE_KEY}&suffix=tar.gz`; } const dest = path.resolve(__dirname, '../geo'); if (!fs.existsSync(dest)) { fs.mkdirSync(dest); } const download = url => new Promise(resolve => { https.get(url, res => { resolve(res.pipe(zlib.createGunzip({})).pipe(tar.t())); }); });
So, it looks like the default here is to use the raw data hosted on GitHub weekly by GitSquared/node-geolite2-redist which says this:
As this is a redistribution, you don’t need a MaxMind license key. However, some additional legal restrictions apply, make sure to read this README and the Legal Warning carefully before deciding to use this.
So node-geolite2-redist
upload’s the MaxMind GeoLite databases biweekly and Umami is using that as a default for their IP detection. I also found another Python package that also helps geo locate based on IP addresses and is currently maintained:
rabuchaim/geoip2fast: Python package hosted on GitHub as well. Updates biweekly. If used as a package will automate the downloading of the redistributed MaxMind GeoLite2 database.
Wait, what is MaxMind GeoLite2?
I hadn’t heard of MaxMind GeoIP services before, but looking around 90% of projects I looked at seem to rely on MaxMind, so it seems to be the dominant force here in collecting and curating the location data from IP to geo locations. Inside of MaxMind there are a few branching options:
Webservice vs Binary Database
This means are you live calling their webservice API or using a predownladed binary database file (for geolite2 this was about 50mb). A quick pro/cons of this would be that the WebService is always up to date, but returns less information (ie Longitude / latitude). The Binary database would not need to worry about rate limiting, it’s your own file, but you’ll need to manage downloading and updating biweekly as new files become available.
MaxMind GeoIP2 vs GeoLite2
This was the one that took me awhile to realize, but is definitely the bigger business differentiation. GeoIP
2 is the higher accuracy, more data, but registered version of MaxMind’s data. Meanwhile, they offer a free version `GeoLite2` which also requires a sign up and API key to download, but at no cost.
MaxMind GeoIP2 Database product costs December 2024. For my use case it would be $134 per month:
Ok, back to Umami using the GeoLite2 Redist
So, the GeoLite2 redist is taking the freely available database and publishing it to GitHub/NPM/Python. The API key Umami referenced would be for client’s that want to use Umami and are paying for some version of the MaxMind WebService / Database GeoIP products.
This makes a lot of sense, and while a bit hacky, seems like a pretty quick way to get started on this project. But what about Matomo, are they also doing that?
Matomo’s geo location
Matomo, another open source Google Analytics alternative, also includes geo location of IP addresses. Their Geo IP location setup is significantly more extensive than Umami and are all manageable via their dashboard. This ease of setting things up comes with the cost that the initial default option is much less accurate: “The default option guesses a visitor’s country based on the language that their browser is set to use.”
Matomo also exposed me to a few more options I should look into:
What is DB-IP.com
Ok, db-ip is looking very similar to MaxMind, but a bit newer. They also have their own libraries for using the live web api and database files which are available for download. I found it much easier to navigate their site, but it also seems like they kept some conventions from MaxMind that made it quick to understand their offerings:
- Web Service API and Downloadable Databases as MMDB and CSV
- A paid accurate version and a less accurate “lite” free version. Paid version costs €499 a year, so much cheaper than MaxMind.
Unfortunately, since I’m not looking to spend money here, it seems like not a great fit. It is only updated once a month the free versions. I don’t know what that real world means for a product, like how often it would lead to inaccurate results, but compared to the 2x weekly for the free version from MaxMind that seems slow. If anyone knows more about this please let me know, maybe this data doesn’t change enough to warrant weekly updates.
So, should I setup and include my own API key?
No. This is definitely not something I should want to do or will do. Each client will have vastly different use cases and I should not register with either MaxMind or db-ip.com and then distribute freely my own API key.
I’m in over my head, time to call an expert
OK, I’m in a bit over my head, so after stumbling on a great little python / AWS example MaxMind project I reached out it’s author John Lukach of 4n6ir who is much more knowledgeable on the IP geo location services and uses. He was able to give some good advice on what is the best thing for an open source analytics project to do when I can’t include an API Key.
- Best is to request clients to use their own API key. This follows what Matomo is doing, but does require a much bigger setup for the dashboard services, but should be my end goal.
- Geolocation data is best effort and shouldn’t be relied on. Now, this is something I already knew, but I think getting into this I did get a bit side tracked. No matter what, the MaxMind or db-ip data is always a bit out of date, always playing catchup with the actual truth of which ISPs have which sets of IPs. It might be worth keeping in mind that ultimately for my mobile attribution, if I’m looking for a source of truth, I should perhaps use mobile APIs to determine the users location?
- ASN enrichment is very useful in leading to cloud providers and VPNs: This is further out, but will be very useful for checking for ad fraud or other patterns. So MaxMind provides an approximate location, and ASN enrichment helps to detect traffic anomalies associated with fraud, VPNs etc.
Final thoughts
Ok, going back to Umami, I see why they did it that way. It’s quite fast and easy to implement, but leaves room in the future to develop towards the direction that Matomo went. So for now, I think I will start with using the free redistributed versions of MaxMind GeoLite2 and see where that gets me for my Open Attribution MVP.