Explore Company Officers of EDF with Ropencorporate
This article follow a first article on how to extract information on a specific company from the opencorporate database.
The first article was focused on companies. Now, we focus on officers.
There is multiple methods to get information on officers with the API: with the method GET officers/search, which allows to get informations on a particular officers based on a name, with the method GET officers/:id which allows to get back information based on an id and with the method GET companies/:jurisdiction_code/:comapny_number, where based on a company_number we could get the id of the officers of the company.
get.officers is a wrapper for the first method. The second method is not reqlly interesting, as there is not a unique id per civil id, making the query useless to create links. The third method is implemented through the function
The question that we could answer, here, is how to find the key officers of EDF? who are the most central officers?
library(Ropencorporate, quietly = T) library(data.table) library(DT) library(stringr) library(stringi) library(networkD3, quietly = T, warn.conflicts = F)
First, we load the details of all the companies related to the term “EDF” created in the first article.
load(file = paste0(path, "DATA/res_oc_", term, ".rda")) # res.oc
Then we query details of the companies, with the help of the function
company.number <- res.oc$oc.dt[, company.number] jurisdiction.code <- res.oc$oc.dt[, jurisdiction.code] company.out.l <- get.comp.number(company.number, jurisdiction.code)
The result of the function is a list with 4 data-table (or data-frame if the package
data.table is not loaded). The first table give details of companies and the second details of officers.
Combine, they permit to create an estimation of the mapping of the interconnection between officers and companies. The uncertainty lying on duplicates of name of society per jurisdiction code(unlikely) and duplicates of names among the societies(likely to happen). So you are warned, it is possible that a same node could be multiple persons or companies.
We are pretty confident that there is no two EDF LLC. in the UK, but we are not confident at all that there is only one Robert Miller among the officers of companies with EDF in the name.
A project here could be to create a scarcity score per name and surname. That way, a rare name appearing two times would show a high confidence of being the same person and a common name appearing a lot of time would show a low confidence of being the same person.
Some cleaning among the names could nevertheless be done. For exemple, we could get rid of the initial of the second name. In that case, we loose a bit of precision in the result.
officers.comp.dt <- company.out.l$officers.comp.dt # get rid of second name abreviation officers.comp.dt[, name.clean := gsub("([ ][A-Z][/.])", "", stri_trans_totitle(tolower(name)))] # Clean officers title officers.comp.dt[, position := stri_trans_totitle(tolower(position))]
From a simple count, we can see that a lot of companies have, as officer, “CORPORATION SERVICE COMPANY”. It is a large Registered Agent service companies situated in the Delaware.
datatable(officers.comp.dt[, .N, by = c("name.clean", "position")][order(N, decreasing = T)])
For companies registered in the UK, we have the nationality of the officers. Here, we could see that companies are mainly represented by British and Frenchs officers.
datatable(officers.comp.dt[, .N, by = "nationality"][order(N, decreasing = T)])
Now, what we want to represent is the link between officers and companies. For that, we use the
forceNetwork function, which allows to do a force directed network graph (logic). Here, we consider that the same officer with multiple positions as two nodes, as we want to give a color for each position.
company.id.dt <- company.out.l$company.id.dt # merge both tables: comp.officers <- merge(officers.comp.dt, company.id.dt[ , list(jurisdiction.code, company.number, name.company = name)] , by = c("jurisdiction.code", "company.number")) # clear position: if less than 20 of a position, it goes into Other comp.officers[, position2 := ifelse(.N < 20, "Other", position), by = "position"] comp.officers[, name2 := paste0(name.clean, ", ", position2)] # nodes: nodes.c <- unique(c(comp.officers[, name2], comp.officers[, name.company])) node0 <- data.frame(ID = 0:(length(nodes.c) - 1) , name = nodes.c, size = 25 , stringsAsFactors = F) # nodes with group: one color per position and one for company nodes.with.group <- merge(node0 , unique(comp.officers[, list(name2 , group = as.numeric(as.factor(name2)))]) , by.x = "name", by.y = "name2", all.x = T) # put a high value for company for the group and the size nodes.with.group$group[is.na(nodes.with.group$group)] <- 30 nodes.with.group$size[is.na(nodes.with.group$group)] <- 50 # sort the table node <- nodes.with.group[order(nodes.with.group$ID), ] # links links0 <- merge( merge( comp.officers[!(is.na(name.clean)|is.na(name.company)) , list(value = .N), by = c("name2", "name.company")] , node0, by.x = "name2", by.y = "name", all.x = T) , node0, by.x = "name.company", by.y = "name", all.x = T) links <- data.table(links0[, list(source = ID.x, target = ID.y , value = value)])
Graph of companies and officers by position
The final result is a huge graph, of nearly 1Mb. You could find the live version here, but try not to open it with a smartphone or a low frequency connexion.
forceNetwork(Links = links, Nodes = node, Source = "source", Target = "target", Value = "value", NodeID = "name", Group = "group", opacity = 1, zoom = TRUE)
For the article, an image is enough.