Graph your email

Bob Mottram 19c78d3095 Tidying 5 years ago
man 65885528ef Link to debian repo 5 years ago
src 78d934a71b Change username and email address 7 years ago
LICENSE 12f53e3bb6 Initial 8 years ago
Makefile 0e9f1d20b0 Packagemonkey script 7 years ago
README.md 19c78d3095 Tidying 5 years ago
configure f9d242f201 Dummy configure file 8 years ago

README.md

Introduction

If you've been a netizen for some time then you've probably by now accumulated quite a lot of email. You can do basic kinds of search on your email archives, but it's also possible to do more sophisticated kinds of analysis which may reveal information which you weren't aware of and which (at least in 2012) isn't a routine feature of most email clients.

emailgraph is a command line utility which can take email in the form of a mailbox file or a set of text files and produce social graphs and other types of graphical representation. You can also produce subgraphs based upon particular keywords, so that you can investigate networks of people talking about a particular topic.

You can also restrict the search between dates or between particular times of day. Any dot file viewing program may be used to show the resulting graphs, such as Dotty, XDot or kgraphviewer.

Installation

Install from source as follows:

make
sudo make install

If you wish to generate a Debian package see https://github.com/bashrc/emailgraph-debian

To plot additional graphs you will need to have gnuplot installed. For example:

sudo apt-get install gnuplot graphviz xdot

Options

| Short | Verbose               | Description                                                         |
|-------+-----------------------+---------------------------------------------------------------------|
| -m    | --mbox <filename>     | Mailbox filename                                                    |
| -d    | --dir <directory>     | Directory to search for emails                                      |
| -c    | --contacts            | Print contacts                                                      |
| -t    | --top                 | Print the most active contacts                                      |
| -i    | --influence           | Print contacts in order of influence                                |
|       | --valence             | Print contacts in order of valence                                  |
|       | --sentiment           | Show valence in plots and social graphs                             |
|       | --sentimentneg        | Show negative valence in plots and social graphs                    |
|       | --start <dd/mm/yyyy>  | Start date                                                          |
|       | --end <dd/mm/yyyy>    | End date                                                            |
|       | --starthour <hour>    | Start time in hours (0-24)                                          |
|       | --endhour <hour>      | End time in hours (0-24)                                            |
|       | --dot                 | Print in dot file format                                            |
| -w    | --linewidth <width>   | Maximum line width in dot file                                      |
|       | --nonames             | Don't use names within diagrams                                     |
|       | --monthly             | CSV containing monthly email volumes                                |
|       | --yearly              | CSV containing yearly email volumes                                 |
|       | --daily               | CSV containing daily average email volume for each hour             |
|       | --weekly              | CSV containing weekly average email volume for each hour            |
| -s    | --sentences           | Print sentences for subsequent semantic analysis                    |
|       | --plotdensity <image> | Plot email density to an image file                                 |
|       | --plotdaily <image>   | Plot daily email volumes to an image file                           |
|       | --plotweekly <image>  | Plot weekly email volumes to an image file                          |
|       | --plotmonthly <image> | Plot monthly email volumes to an image file                         |
|       | --plotannual <image>  | Plot annual average email volumes to an image file                  |
|       | --keywords <words>    | Search for keywords                                                 |
|       | --corpus              | Prints all emails which contain matching keywords                   |
|       | --title <title>       | Title to be added to a plot                                         |
|       | --min <number>        | Minimum number of interactions                                      |
|       | --gender <M/F/MF>     | Select only males (M), females (F) or male/female interactions (MF) |
|       | --merge <filename>    | Specify a merge file containing contacts to be merged               |
|       | --exclude <filename>  | Specify an exclude file containing contacts to be removed           |
|       | --tests               | Run unit tests                                                      |
| -v    | --version             | Show version number                                                 |
| -h    | --help                | Show help                                                           |

Example social graphs

To create a social graph in dot format for an entire email archive

emailgraph -m mboxfile --dot > social-graph.dot

An email stream may also be piped in from elsewhere

Like this:

cat mboxfile | emailgraph --dot > social-graph.dot

You can also process multiple emails within a directory structure

This will recursively examine subdirectories and attempt to read all files within them:

emailgraph -d emaildirectory --dot > social-graph.dot

To produce an anonymized social graph

This is recommended if you are going to publish the graphs publicly and wish the names of individuals to remain private.

emailgraph -m mboxfile --dot --nonames > social-graph.dot

To limit between dates

Where the date is given in dd/mm/yyyy format:

emailgraph -m mboxfile --dot --start 20/1/2007 --end 30/5/2009 > social-graph.dot

You can also specify a start date relative to the current date using the options:

--start month/3month/6month/year/2year/3year/4year/5year

Such as the following, to show a social graph for the previous 6 months:

emailgraph -m mboxfile --dot --start 6month > social-graph.dot

To limit between times of day (eg. within working hours):

emailgraph -m mboxfile --dot --starthour 9 --endhour 17 > social-graph.dot

To investigate graphs where participants are using particular keywords

Keywords are not case sensitive and can also include names or email addresses. Multiple keywords are separated by commas.

emailgraph -m mboxfile --dot --keywords "foo,bar" > social-graph.dot

You can also use the --min option to show only those individuals with more than a given number of interactions. The --corpus option may be used to print all emails which contain matching keywords.

emailgraph -m mboxfile --keywords "foo,bar" --corpus > emails.txt

Dot files may be viewed using other utilities, such as dotty, xdot or kgraphviewer.

Merging contacts

In some cases you may see multiple nodes in the social graph which correspond to the same person. This may be because their name is spelled differently, or because they have more then one email address. To make the graph neater and easier to interpret it is possible to manually specify the names or email addresses of individuals which you wish to be merged together. All entries are case sensitive.

To do this create a merge file such as the following:

J Smith,John Smith
Clarke K,Clarke Kent
Susan,Susan Jones

In each case the first name is the name as it currently appears in the graph and the second is the name that you wish it to be changed to. Then use the --merge option, for example:

emailgraph -m mboxfile --dot --merge mymerges.txt > social-graph.dot

Excluding contacts

Contacts can also be excluded from the social graph. This works in a similar manner to merging, but uses the --exclude option to specify a file containing names to be removed. Names are case sensitive. For example you could make a file like this:

Kevin Killjoy
Mr Troll
Doctor Strangecode

Then call emailgraph as follows:

emailgraph -m mboxfile --dot --exclude banned.txt > social-graph.dot

Example usage graphs

In order for this to work you must have gnuplot installed.

To plot email volume for each month, with a specified title:

emailgraph -m mboxfile --plotmonthly monthly-emails.png --title "My graph title"

To plot email volume for each day of the week:

emailgraph -m mboxfile --plotweekly weekly-emails.png

To plot email volume for each hour of the day:

emailgraph -m mboxfile --plotdaily daily-emails.png

To plot individual emails as points:

emailgraph -m mboxfile --plotdensity email-density.png

You can also combine this with keywords and date or time limits, as required.

Sentiment analysis

Usage and social graphs may also be produced for measurements of valence.

For example the following plots monthly total valence values:

emailgraph -m mboxfile --plotmonthly monthly.png --sentiment

If you are only interested in negative valence you can also do:

emailgraph -m mboxfile --plotmonthly monthly.png --sentimentneg

Graph node shading effects

Social graph nodes may have their background shading altered depending upon the average valence values for emails sent by each individual:

emailgraph -m mboxfile --dot --sentiment > social-graph.dot

Here lighter background shades indicate more positive valence, and darker represent lower or more negative valence values.