I’m writing my thesis right now, so I haven’t had much time to post.
I am now going through my literature review and I was looking for ways of storing and analysing all my citations so I can do a bit of bibliometrics.
Long story short, after trying json and xml, I stumbled across yaml. So now I feed a yaml file into Python and look at how many publications each author has, what were the most active years, most common journals and so on.
All my citations, however, are in bib format (I use latex) and I need to change that to yaml. In other words, the input bib entries look like this
@article{Hippel:horiInno:2007, Author = {von Hippel, Eric}, Date-Modified = {2014-07-02 10:47:11 +0000}, Journal = {Industrial and Corporate Change}, Number = {2}, Pages = {293 - 315}, Read = {1}, Title = {Horizontal Innovation Networks By and For Users}, Volume = {16}, Year = {2007}}
and I need them to look like this.
-article: &HippelhoriInno2007 author: -von Hippel, Eric date-modified: "2014-07-02 10:47:11 +0000" journal: "Industrial and Corporate Change" number: 2 pages: "293 - 315" title: "Horizontal Innovation Networks By and For Users" volume: 16 year: 2007
pretty straight forward if you ask me.
To make things easier, I have also added a very basic command line option for input and output files. This means that if you save the script as ‘bib2yaml.py’ then you can use it like so
$ python bib2yaml.py my_input_file.bib my_output_file.yaml
Anyhow, here is the code. Right now it will only work for articles, I haven’t tested it for books or media.
# bib2yaml.py import re import sys # from terminal arguments str_input = sys.argv[1] str_output = sys.argv[2] # open the file with open(str_input, 'r') as fr: list_lines = fr.readlines() # list the output line list_output = [] # go through the lines for str_line in list_lines: # first line with id if str_line.startswith('@'): sg_t1 = re.search('^@(.*){(.*),$', str_line) str_id = re.sub(':','',sg_t1.group(2)) str_first = '\n-%s: &%s' % (sg_t1.group(1), str_id) list_output.append(str_first) # for the other lines elif str_line.startswith('\t'): sg_tn = re.search('^\t(.*) = {(.*?)}', str_line) str_cat = sg_tn.group(1).lower() str_val = sg_tn.group(2) # make a list of all the authors if str_cat=='author': list_authors = str_val.split(' and ') str_auths = '\n -'.join(list_authors) str_aut_out = ' %s:\n -%s' % (str_cat, str_auths) list_output.append(str_aut_out) # all the integer values list_ints = ['number','volume','read','year'] if str_cat in list_ints: str_jt_out = ' %s: %s' % (str_cat, str_val) list_output.append(str_jt_out) # all the string values else: str_gen_out = ' %s: "%s"' % (str_cat, str_val) list_output.append(str_gen_out) with open(str_output,'w') as fw: fw.write('\n'.join(list_output))