Skip to Content
2023.03.20

Hosting a Collection Server

Last updated 2023.03.20

With the goal of being as transparent as we can, It’s possible to host your own instance of a Pushbroom data collection server. The source code for the server is available on GitHub under an MIT open source license.

The server is a SvelteKit application that only runs route endpoints functions. This allows the application to built as either a collection of serverless functions or a traditional Node.js app, whichever is appropriate for your deployment needs. Included in the repository is a Dockerfile for building a container image of the server, as well as a fly.toml configuration file deploying that container to Fly.io. Pushbroom also can deploy as-is to Vercel.

Setting up a self-hosted instance of Pushbroom requires a few small configuration steps.

Setting the Environment Variables for your Triplestore.

In order to run your own collection server, you need to host your own instance of a triplestore to store the data you’re collecting. Once you’ve done that, you can set the following .env variables to reference your store and credentials”:

sparql_endpoint=https://stucco-proxy.fly.dev
sparql_user=username
sparql_password=password

Setting the Collection Endpoint URL.

The file at /static/ping.js contains the client-side script that runs in the browser and sends data to pushbroom. At the very last line of this file, you can see where the function is being initialized with the URL of the collection server:

…(window, document, 'ping.pushbroom.co')

Change ping.pushbroom.co to the URL you will be hosting the server.

Now when the script it loaded, it will start sending analytics data to your deployment of Pushbroom, which will persist the data in your own hosted triplestore.

Running the Pushbroom Analytics Application Against Your Data.

Since all your data is stored in a standards-implementing triplestore, it’s all freely accessible via SPARQL as RDF. You can run your own analysis tools against your data, pipe it into different applications, or just download it all for static analysis.

In order to use the Pushbroom Analytics application to understand your data, you’ll have to use a white-labeled deployment that communicates with your – and only your – hosted servers. If you’re interested in this solution, please reach out to us for more information!

References

  1. https://github.com/stucco-software/pushbroom-server
  2. https://www.w3.org/TR/rdf-sparql-query/
  3. https://www.w3.org/RDF/