Skip to content

v2.8.0

Compare
Choose a tag to compare
@edsu edsu released this 23 Oct 20:46
· 183 commits to main since this release
a5209e6

v2.8.0 adds some new controls for shaping the data that is returned from the Twitter API. The default behavior is for twarc to retrieve the fullest
representation of a tweet by requesting all tweet, user, media, place and poll fields as well as all available expansions. This is generally good practice with twarc because it means that downstream processing of the collected data can rely on have all this data at its disposal. However there may be cases where you want to customize the data that comes back. This is not recommended practice but it could be useful in some contexts.

The following options allow you to fine tune the types of data that are requested when using the following sub-commands: search, searches, tweet, sample, hydrate, users, mentions, timeline, timelines, conversation, conversations, and stream. The options include:

  --expansions TEXT               Comma separated list of expansions to
                                  retrieve. Default is all available.
  --tweet-fields TEXT             Comma separated list of tweet fields to
                                  retrieve. Default is all available.
  --user-fields TEXT              Comma separated list of user fields to
                                  retrieve. Default is all available.
  --media-fields TEXT             Comma separated list of media fields to
                                  retrieve. Default is all available.
  --place-fields TEXT             Comma separated list of place fields to
                                  retrieve. Default is all available.
  --poll-fields TEXT              Comma separated list of poll fields to
                                  retrieve. Default is all available.

These correspond to the API Fields and Expansions.

There is also --minimal-fields which requests just a minimal subset of data, and --no-context-annotations that does not include context-annotations, which allows more tweets to be fetched at one time (500 instead of 100). This also applies to the sub-commands: search, searches, tweet, sample, hydrate, users, mentions, timeline, timelines, conversation, conversations, stream.

  --minimal-fields                By default twarc gets all available data.
                                  This option requests the minimal retrievable
                                  amount of data - only IDs and object
                                  references are retrieved. Setting this makes
                                  --max-results 500 the default. NOTE: This
                                  argument is mutually exclusive with
                                  arguments: [--counts-only, --poll-fields,
                                  --media-fields, --expansions, --no-context-
                                  annotations, --place-fields, --user-fields,
                                  --tweet-fields].