Button Button

Archives with ActivityPub

This is an article about use of ActivityPub in a manner that's incompatible with current implementations. It's a description of ActivityPub as a DDL, or data description language, for describing data that we may not typically think of as ActivityPub Objects, may have IDs that are not HTTP IRIs, and that could possibly be in offline collections

First, a little weirdness

I'll save the real wildness for a blog about ActivityPub client to server protocol, because server to server federation is essentially a problem in replicating a distributed database, which is quite enough for one article and certainly not what most people are thinking when they hit 'toot'. Even if a different syntax, e.g. yaml, was preferred over json for a given application and that application wasn't microblogging, then ActivityStreams and possibly ActivityPub are worth a look for their vocabulary. These are a set of conventions for tracking changes in media collections and a method of routing change information to interested parties, respectively

If someone has worked with SQL previously, their immediate reaction to reading the ActivityStreams Vocabulary is CRUD. Actually 'CrUD', because read operations are implicit. It's obviously a database language, but the unit of data that create and delete operate on is the object. Update changes an object's relation to collections

So it's not especially relevant what the original application for ActivityPub may be. If you want to manage objects in collections and route change information to interested parties, it's simply a matter of applying the vocabulary to what you mean by an object and collection. From there, it's just a matter of filing in any missing semantics. That's not trivial, but it's much easier than inventing an entire protocol out of nothing and it has the advantage of already having solved problems that you may not have anticipated

What is an Object?

An Object is your content, with a wrapper of metadata. Read the specs for the formal description, but for our purposes we need to use the source attribute along with the required id and type attributes

The source attribute is an object with two attributes - mediaType and content. Media type can be any mime type. So... There's no constraint on the type of media. It may not be convenient to embed a Base64 encoded video in a printed archive, but you could

The id is a publicly dereferenceable IRI. There are reasons for our assumptions that this will be an HTTP identifier and a string and those reasons are relevant for interoperability. If we're talking about managing an archive, however, there may be additional considerations more relevant than interop for our use case

It may be useful to have a list of copies - HTTP IRIs referenceable over DNS or Tor, FTP URLs, file system URLs for maintainers, geo URLs for physical copies. If interoperability with existing implementations isn't a primary concern, then the semantics of ActivityPub already support collections of arbitrary source data with a method for creating identifiers to cover any conceivable access method

The Object Type is a mechanism for communicating processing and display expectations. Note, Article and some other ActivityPub types may be relevant to an archive. If an extension, formal or otherwise, is desirable, type is a list and the base type Object is included in the list, which should allow other implementations to communicate about the Object regardless of whether they can render the content. This means that metadata about your object can be handled consistently by implementations that don't yet exist, e g. a future archivist who scans printouts of your yaml dumps

Some Objects are Activities that contain routing information and notifications about how the contents of Collections have changed. This is going to be relevant for maintaining consistent state in archives

Activities, collections and streams

This is where we develop a structural understanding of ActivityPub relevant to archival processes

An ActivityPub Collection is a collection in the plain sense of the word, but specific collections have defined uses for routing information, like the Followers Collection. Collections of content beyond the Inbox and Outbox are an underutilized feature of ActivityPub, but likely a principle concern of an archivist

The name "ActivityStreams," the protocol ActivityPub extends, is a hint as to the way Activities work. A steam of Activities would be all the Activities that affect an Object. This implies that Activities are a journaling mechanism, so you can index Activities for efficient processing and you can use Activities collectively to build or verify the state of the Objects they reference

So Activities are Objects that affect, or describe effects on, the state of other Objects. They describe the delta, or change, in an object or collection in a manner similar to version control for software, but with a granularity for the scope of a change that's appropriate for managing media collections

Implementation suggestions

Don't worry about JSON or HTTP except where you interface with federated systems. You still gain the advantages of ActivityPub for managing distributed collections of objects and future archivists will be able to bring your collections onto the network reliably, even if the tools don't exist today

Activities are Objects that reference other Objects; the Object referenced may be an Activity. In electronic systems with random access, normalize the internal representation of Activities so that other objects are included by reference, not quoted. Treat this normalized form as immutable. Sharing Activities with compliant implementations requires expanding the tree. This can be done by traversal, but may be expedited with the benefit of an index. You should probably use the expanded form in yaml or a similarly expressive format for offline storage, e.g. tape backup or paper copies

While federation requires using object notation with a tree format, which has the benefit of not making demands on the internal data structure of federated services, archival applications that maintain an index of the Objects they have received may want to use arrays or lists to synchronize - especially when using more verbose alternatives to json for extra-Fediverse communication, synchronizing Objects across multiple collections, or synchronizing infrequently

A final note

These ideas grew out of various failed attempts of the Diplomacy playing community to standardise information interchange formats for postal and electronic archives and recently resurfaced in a discussion with a friend about their desire to have a flexible archive format. The article is an attempt to record the ideas while they are fresh in my mind, because I plan to revisit the idea of gaming archives after I have games running

The main purpose of (yaaps)[https://gitlab.com/swift2plunder/yaaps] is to provide ActivityPub federation to arbitrary data stores. My work on (Sputnik Opphuichi)[https://gitlab.com/swift2plunder/sputnik/-/milestones/2] came about as a need to provide a target source for the reference implementation

If you'd like to help me survive capitalism with sufficient means to continue working on this, please see my (Liberapay page)[https://liberapay.com/YAAPS] for more information. Thank you!