#161: Open Data at the New York Times, with Scott Feinberg, API Architect

Posted by

Welcome to episode number 161 of CXOTalk.
I’m Michael Krigsman and today we have a very interesting show. I’m talking with
Scott Feinberg who is the chief API architect or I guess you are the API architect at the
New York Times and we’re going to talk about the role of API in supporting the core mission
of the New York Times. Scott how are you, thank you so much for joining us today. I’m great thanks for having me. So Scott tell us, you’re the API architect
at the Times, let’s begin by what you actually do, what does that mean, and tell us about
the mission of the New York Times. Sure, so as the API architect you know my
job is thinking about how we manage APIs. I spend most of my time working on systems
to make APIs better at the Times and consulting with teams as to how to build their APIs,
how they should be going about integrating with them. so it’s really you know, any
team that wants to build a service, that works for a lot of different teams I’ll oftentimes
come and help in whatever way I can and make sure their services are going to work well. So when you say you’re building services
for the New York Times, you know our audience is business people, so they might not even
now in general what that is and certainly not at a detailed level, and certainly not
as how it connects to running the Times, so maybe explain that linkage to us. Totally, so the New York Times at no other
time in history has been such a digital company. At the end of the day you think of us as this
newspaper, but at the end of the day we’re a digital content provider and the paper is
just another way that people get that content. So you know, we’re on every platform you
can think of. There’s the New York Times, we have lots of digital properties. You know
we have a real estate section, a huge video team. We’re on iOS, android, on the web,
everywhere. You know the Times want to be better. We have a huge cooking vertical now.
We’re leveraging 165 worth of years of content and reworking that for a digital age where
not only are we on your doorstep but we’re in your house, we’re with you when making
food, we’re helping you make those lifestyle decisions. The New York Times is always there
in your pocket, and so as a digital organization we’re very much a technology company.
you know there’s 400 engineers who work at the Times and we’re focused on building
these amazing products and these experiences but to do that we look you know a lot more
like Facebook or a Google than we do say a Wall Street Journal. So digital is a fundamental core part of what
you’re doing and you have a large group of people who are focused on that. So when
you say you’re looking more like a Facebook or Google than a Wall Street Journal, maybe
could you elaborate on that because from the outside perspective the New York Times is
about delivering news, so we think about reporters and think about photographers getting the
news? So Mark Thompson our CEO talks about it as
this pyramid, and you have the journalists who sit at the top and our job as a organization
all of us who support them in this pyramid, our job is to take the reporting that they
do out in the field and bring it to people where they need it. And informing people with
the information that they want to hear. We has journalists in something like 44 foreign
other bureaus and they’re all over the world, and our job is to take that content that they
write, build and get it out there. We are the new keeper, we are that distribution.
So the reporting is obviously like this is arguably some of the best content in the world.
Our job is to say, okay this is awesome. How can we make it reach the most people and make
the most impact. So you’re creating content and now you’re
thinking about that content as part and parcel of multiple distribution channels, so the
print paper is one of those, but now equal standing are the web devices and so forth. Exactly. First off I was going to say where do APIs
come into this, but very briefly maybe just give us a laypersons definition of when you
talk about APIs what does that actually mean. So all an API is is it’s an interface to
a machine system. So it’s a way of expressing to a computer, to an application some sort
of program telling it this is the content that I want, this is the content that I want
you to store or to have. Same as when you open up an application like Photoshop that’s
an interface as to manipulate some images. APIs are ways to manipulate systems. So when you say manipulate systems, again
for those of us who are not technologists at all, tell us what you mean and give us
an example. Totally so we have a system that’s basically
a CMS. It’s called Sqoop and that system has some APIs that allows other systems to
give it articles and give it information about different things and also hold that information
out. So all it is is a way for something like Sqoop, other systems like say our website.
It’s a way for the website to say hey, I want to show this some article, so it makes
a call and says, hey let’s just grab that and then it decides how it wants to show it. Okay, so you’ve essentially broken your
system up into many pieces, is that the right way to describe it? Yes, so any enterprise these days to iterate
on building software you have to split up your system at some point. There’s arguments
you know that you can continue building with one system but 400 people cannot work on the
same system all the time. It just doesn’t work.
So what ends up happening is you end up with all these different services that can be independently
built and independently changed and they interact through APIs. So maybe I have an article service
and an author service. And the author service pulls information from the author service
to add information to articles, and things like that. Maybe those are two physically
different teams and maybe we can improve both of those products, some independently or not
at all. Okay, so you’ve broken it up into these
can we say micro services to use. That’s a loaded term but yes. Okay, so the New York Times has got all of
these pieces and some of them are graphics, and some are images, and some are recipes,
headlines, is that an accurate way of thinking about the Times? So if you just look at our homepage of the
Times and you can see at the very top you see weather. That’s an API. You see the
watching feed on the right hand side, that’s an API. There’s a link to crossword, advertisement,
that’s an API. Recommendations another one. You know we estimate that to create the New
York Times experience, you’re talking between 40 and 50 different services. And those are
40 or 50 different APIs to build all of that content. So we have now the essentially the building
blocks of the New York Times right, that’s what you’re describing. Yeah. So we have the building blocks of the Times
so what’s the role now we have these building blocks, what’s the role of the APIs? The role of the APIs is that these are the
ways that when we build a new iOS app, when we build a new android app we’re going to
leverage these same services. So you build the website once, and then you want to build
the android app, you can leverage these same services to build that new app.
It can look completely different and it’s going to operate very differently. But a the
end of the day it’s calling into those same services. So as you build more and more or
these reusable services you can build new products faster. I have to say this is really interesting I
have to say because this is somebody who has like I guess most of us watching, have grown
up with the New York Times, not thinking about the components in this way you’re describing
the building blocks. So okay, we’ve got our weather service or component or whatever
the term is we want to use, and we have our articles and our recipes. So you have decomposed
of what we usually think of as our morning newspaper into these pieces. So first off,
what’s the language do you use? What are each of these pieces called? So we describe someone who is using a API
as a consumer and someone who is providing one as a provider. And sometimes a consumer
is also providing, and actually oftentimes that’s the case. But it’s this relationship
of who is making the call and who is receiving the call. So we have providers and consumers.
More often we just call the people who are using the APIs users and that user could be
an android app. It could be a iPhone app, it could be the website. Okay, so you’ve got the components, you’ve
got users. So the function then is how can we take this body of content, these providers
you call them and chunk it and recombined the pieces so that it’s appropriate for
each platform whether it’s a mobile, whether it’s a phone, website what have you recombined
in such a way that it’s going to take full advantage of that particular platform brining
the best content experience in a sense. Is that – I don’t mean to put words in your
mouth. No exactly, so the Apple Watch that’s a
very different experience with very different needs than the desktop website. They’re
just different. You may not want recommendations on your wrist but you probably want breaking
news. So we’re able to leverage these same systems, these same ways that we’ve cut
up content. You know, morphed it into a form that’s more useful to us.
Recommendations is a great example. We know what you read, so we can build a service that
can say these articles that you might like them. So by having these services available
we’re able to take what we need and not rebuild three different ways of doing the
same thing. But it’s not just you though right, because
you make the services available to the world – to developers. Yeah so there’s two aspects to that. If
you go to developers dot nytimes dot com, you can try out a lot of the APIs that are
public. So you can do things like searching for articles, you can get comments, there’s
top stories, find out what’s most popular. And it gives you the opportunity to build
experiences with our content. But in addition to that we’ve been providing
that content forever. Newspapers have always done this and they’re actually one of the
first sort of like content API providers where you would actually you know fax, or send on
the wire or you know mail, by send by in person copies of the ran stories. So AP, great example,
that’s kind of all they do. Other people print what they write and we print their stuff,
they print out stuff. So in addition to big media but also small ones to, you know a tiny
newspaper can also grab that content and reuse it in their paper. So APIs, the external ones
are just a different way of giving people that opportunity to use that content. So the then APIs create efficiency because
in the past, as you say people have been sharing newspaper articles but this now allows developers
to do it in a systematic and very efficient way. Are there interesting examples that you
can point to of how people are using APIs. Sure yeah, so a lot of the people who actually
use them are typically researchers who are trying to find patterns. There’s been a
lot of cool like views of taking our content. You know we’ve 165 years of it and saying
okay, how has the spoken word changed or what stories have been popular you know between
like 1940 and 1970 and they do cool things with that.
But we have a lot of people who use it on their websites. They want all the articles
about a certain topic or for people to actually read, and theirs people who use our APIs without
ever actually interacting. There’s something called IFTTT We have public
libraries that use our best sellers list to programmatically decide what books to buy.
There’s been a lot of interesting uses and we’re always looking to both learn from
our users what they want and also take those experiences and putting them back to our main
product. Now you also think about your APIs and I say
your APIs because you are the person who is designing these APIs, architecting them. You
think about these APIs is a very direct way supporting the core mission of the Times and
again maybe restate what that core mission is and explain what in the world does that
have to do with APIs or vice versa. Sure so the core mission of the Times is to
enhance society by creating, collecting, and distributing high quality content basically.
Most of the APIs most of the time involved with that distribution angle, sometimes there
using and creating new experiences. But most of the time they’re used for distributing
that great content, and without them it would be very hard for us to create a lot of different
information where people want it. So our mission isn’t to just print like
make a newspaper or to just give it to you in the static form. We want to inform and
give you the news and the information in whatever way makes the most sense for you.
So our our cooking website started off is a realization that, hey, we have 17,000 recipes
dating back to the 1800s, I wonder if anyone would be interested in actually using those.
And because we have a service that already stored all of that, they were able to say
okay, I’m just going to pull down all of these recipes and try to work with them. And
that’s the kind of experience that you can build and reuse when you’ve already built
these services, and then it allows us to rapidly build new ways of spreading this news. So these APIs in a sense also represent organizational
boundaries right, because you’re building APIs for weather and you must have a team
of people that manages the weather. You’re building APIs for recipes or what have you.
so the APIs are kind of programmatic embodiment of the organizations, what do you think about
that? Yeah, so you know there’s this charm called
Conway’s Law So at some pure point at say time zero in
the past there were no APIs. The APIs that are built reflect the structure of the information
and as you said Conway’s Law reflects the structure of the teams creating that information. At that time. At that beginning point, and you have to keep
those APIs around because people may want to use them. So what about the evolution process
of the APIs and how do you manage that inside the New York Times because I’m assuming
you also want to get rid of APIs because you want to – I was going to say force, but
let’s say gently encourage, or maybe it is forced. Gently encourage people to use
the information in a particular way. Maybe the Times has come up with new usage guidelines and maybe new
types of information, maybe you have figured out ways better ways that the jigsaw puzzle
can piece together. Yeah, So what do you do, how do you handle that. This is what I referred to as the API lifecycle.
APIs are born, they are built, they are used and then all APIs must eventually die and
that’s something that’s really hard. Because when we first build a new service or system
we never want to think about that. We never want to think about this end state, where
this thing that I spend all this time building and we put a lot of resources in it will eventually
die. And that’s almost the most important part of using any API is understanding that
this will not be here forever. It might change. It might be improved or the thing that’s
powering it might no longer be here. So you know that’s why we are building processes
and building basically just like understanding of okay, I have this API that I want to go
and use and instead of just integrating with it so that it works, I make sure my app in
the event that I won’t be able to update it later on and let’s say it’s an iOS
app people never actually have to update it. It can go and live on for literally years,
and we’ve actually seen that happen. We have to make sure that in that app that
if my API dies and I can no longer use it then I can offer the same experience to that
user. Now if someone has an old version of our app, been there for like a year and they
go to reengage with the Times. They haven’t updated because they haven’t been using
it. They open up that app and it breaks. It just fails because that APIs dead and that
oftentimes is what happens if you don’t plan for these things.
Well that user, what’s the likelihood that they haven’t touched us in a year and one
there interaction is you crashing your app on their phone, or worse it doesn’t immediately
crash. It looks like it’s working and then it just like breaks or gives something expected
like an actual air-code to the user. Those are things you never want to happen,
but can happen and do. So the AIPs lifecycle is all about okay, let me plan in my app what
am I going to do when this API is dead. How am I going to tell the user and maybe it’s
as simple as telling the user you should probably upgrade because this app is out of date. Thinking
about things like that at that integration point and not you know a year down the line
when that API you know inevitably dies. So you’re really thinking almost from the
beginning about the obsolescence of the APIs that you’re creating and what’s going
to be the impact. Exactly, like the only thing that we can be
sure of is that technology is going to change and our business needs are going to change.
If anything has taught newspapers in the past you know 20 years is that business models
change and we can’t count on the same things that worked you know 10, five, even one year
in the past. You know things are changing rapidly. So like from the offset we need to
make sure that we are planning with that in mind. What’s the hardest part about designing
APIs at a place like the New York Times? So the hardest part is not designing new APIs.
That’s kind of the easy part because today you know the New York Times has tons of really
smart people and you know they want to build really good things. You know the challenge
is when you’re building something new, so just put in the Times actually, design it,
think about it, share with the stakeholders about how this thing might work, getting their
input and then actually building it. That’s almost the easy part.
The challenge parts are how do we deal with old things. How do we update old things because
you know, we’ve had a website for almost 20 years now and we have services that have
existed for you know, I think the oldest that I’m aware of that’s still in use to some
capacity is over 16 years old, like that’s really legacy tech.
And you know the challenging parts is how do we deal with that both from a consumer
side and from a provider side. How do we migrate that to something else? How do we kill it
when no one was thinking about that at the time? How do we make that less painful and
things like that. And how about from an organizational standpoint
because when you’re building these APIs and you mentioned Conway’s Law earlier,
when you’re building these APIs in effect you’re memorializing how different parts
of the organization work and you’re describing the boundaries in a sense and the silos around
what different parts of the organization do and are responsible for those teams. So how
does that come into play or does it come into play or does it not come into play. It comes into play, it’s a matter of there’s
a lot of talk about at that stage of how do we decide which system should own this new
feature, which team is responsible for this new thing, where should it live. And because
we have those conversations and we don’t just smack it onto anything, the hope is that
we’ll make a good one and we’ll put it in the best workspace. Because oftentimes
there isn’t necessarily the best option, but we want to minimize that impact and that’s
the move towards micro-services. We’re rapidly moving to make our services smaller and that
means more APIs. But as your services gets smaller it’s easier for you to have at least
smaller siloes and there’s to be less of a cost to build something small, put it out
there and get people using it and then iterate on that.
Because you can move things between teams, move the ownership is a lot easier when it’s
really small. And let’s say I built a small feature and it turns out my team shouldn’t
own it. It’s not really relevant. But if I built it in a small way the team or the
system that should own that it’s less of a cost for them to say, okay we will build
something that does exactly that, we’ll just do it in our thing and people can just
start using this new system, you know we can just switch them out. But if it’s larger,
if it’s a lot more complex it’s a lot harder to do.
Granted, these are enterprise problems, you know where we face real enterprise problems.
For a smaller company that probably doesn’t make sense. If you have three developers in
your building, if you have 20, if you have 100, building services this small doesn’t
make nearly as much sense especially when you don’t have 20 years of legacy systems,
and that’s like the big differentiator is like when you’re at this larger scale these
things matter more. We kind of have a business model that kind
of works. It’s kind of worked for like you know centuries at this point. You know when
you’re a small startup and you don’t even know how you’re going to make money don’t
focus on which service your API is going to live in, you should probably just stick your
code. And think about this stuff from the perspective of let’s make sure that the
API that I designed is at least good and not necessarily where it lives. Because when a
bigger company comes and says, I want to integrate with you guys, send me your API, you will
be judged on that. And that’s a big deal, because when you’re
small you can’t necessarily build new systems all the time. You want to integrate and building
APIs that you can eventually make public is a great way of doing that. Now, in order to build these APIs don’t
you have to have a very detailed understanding of the type of content you’re putting the
API on as well as the intention behind that type of content, as well how that content
fits into the broader scheme, so you have to have real expertise on the subject matter
on the part of the newspaper you’re constructing the API on right. Yeah, you have to at least get a good understanding
as you can. You’ll never know it all and you’ll never be able to for sure build the
right abstraction. However, if you met with enough people using it and enough people who
have built similar systems that know this domain you can at least get something that’s
mostly right. And if you think about that APIs lifecycle where you can kill this later,
you can do a new version; you can change it. It doesn’t have to be forever if you’re
thinking about that from really day one And what about you’re working with the subject
matter experts in the various parts of the Times, how closely do you work with them,
how much support do they give you, do they understand the importance of it? Do they think,
oh this is just this technical thing, what are those relationships like? I’m sure they’re
good relationships but give us some insight as to how that all works. So one thing about the Times everyone is incredibly
nice. You know it’s a large organization but like other departments are normally more
than willing to help out, to figure out what that partnership should be. Good examples
of the teams that actually work hand in hand with experts are our cooking team. They work
directly with our editor for food, our games team works with our crossword people. They
work hand-in-hand to get an understanding what the product needs and where they envision
they should go with and how it should work. Examples of some really hard APIs to build
was the cooking; recipes are an incredibly hard domain to model because you have this
you know unlimited sizes like you know a dash of sugar. How do you represent that when it’s
actually a metric of size, but at the same time it’s like can you have two dashes of
something. It’s those sorts of things. Also just representing food and recipes can
be really hard, and that’s why we push people to spend as much time upfront designing these
interfaces, thinking about them and not thinking about building it until you’ve actually
decided, okay, how am I going to model this, like what does a recipe look like.
Those are the sorts of decisions that teams of all sizes should be putting that time into,
because what you’re doing is investing upfront in building with experience to work with that
data making that as easy and seamless as possible. And it’s going to be less likely down the
road that someone is going to either complain or need assistance on so it works differently
if you’ve done the work upfront to really model it correctly. So let’s talk just a little bit more on
cooking and of course the New York Times I remember when you released the cooking app
and did everything about cooking and it’s so great. I mean it’s fantastic, so what’s
the role of the cooking staff, the recipes staff versus the role of the API staff and
figuring out how to model that. So it really comes down to the team that actually
builds that cooking app. They’re the ones that actually sit down and make the decision
onto how that works. Because at the end of the day technologists are most likely going
to know best about what the technical experience is that will work best for that domain. But
working with those domain experts in cooking recipes allows them to have a great understanding
of that content and do a little bit of anticipation of okay, you know I’m going to model this
this way because I know that recipes sometimes look like this and you know we might add new
ways of presenting recipes and sizes and measurement and things like that. So they really have
to be in a way experts in how that stuff works and how like cookbooks work. Because really
it’s a cookbook on the internet and every member of that team loves cooking and are
really passionate and that really feeds into a design that works really well. So understanding the content and being interested
in the subject matter is a significant part of designing APIs well. It definitely helps. If you don’t know what
you’re modelling and you don’t know what you’re building it’s going to be very
hard for you to know to anticipate what other people will want from your system and that’s
key. You’re designing how people are going to use information and use content. And the
people who are requesting it they’re going to know what they want, and they’re going
to know you know from an interface standpoint what they want that information to look like
and you’re not going to do a very good job if you know less than they do. So it truly
is about being at least fluent in the language of what you’re modelling. You know we’re almost out of time but there’s
one last point that I think is extremely important that I hope you can address for us. And that
is for an organisation that wants to encourage both adoption of its APIs, but from a broader
perspective and even more importantly, if an organization wants to be a data provider
such as the New York Times of open data, what do they have to do in order to let’s say
encourage adoption. The key is you can’t be stingy. You have
to just give it out. When we launched our developer portal there’s a lot of questions
like, are people going to be stealing our data, questions like that. Just give it away.
You don’t have to give it all but don’t be stingy, and you will find that first off
not that many people are going to use it at first. you’re going to find that out, but
the people who do, you’re going to find those passionate people who are really interested
in using your data in new ways. You will get companies like Slack which has
built their business on having great APIs. You know Slick I think got voted the number
one new enterprise products. We use it at the Times. You know the reason why they’re
sticky is because they have APIs that allow you to integrate. So you can have all of your
systems feeding into Slack and also going out of Slack.
You know, we’re building on that experiences using slack bots to talk about politics that’s
really interesting but those are things started off as just people in our R&D lab hacking
around with some APIs. So you’re enabling people outside of your orb, your real users
to build the experience they want to have and if you offer the APIs to allow them to
do that they’re never leaving. Because once they’ve integrated they just put up a lot
of upfront work and it’s going to be a lot harder for someone else to build something
similar and get them to rebuild all of their existing systems. And I would be remiss if I just didn’t follow
up and say, as you talk about not being stingy and give it all away, are their newspaper
gods up in the sky who are looking down and frowning because the newspaper business relies
on you know it pays for the development of that content and it relies on people buying
it. And now you’re advocating giving it away. So to be clear the information that we give
is everything but article content. You can search for articles. You can find out what’s
trending. You can almost do anything you want with our data through our APIs with the exception
of actually reading all of the content. But at the end of the day, one of the best
things about being a news creating machine is that we write new content every day. And
we’re building new content and exposing it in new ways every day. So if people want
to take your content they’re going to. They’re going to scrape your website. They’re going
to find ways to resell that data. You’re not going to be able to stop them.
So it’s really about giving people the opportunity to really interact with your content in ways
that you’ve never thought of, and empowering your community to figure out what they want.
You know while we don’t give our actual article text away, we give pretty much everything
else and people build a lot of really cool stuff on top of that. Okay wow, that’s been really interesting.
We’ve been talking with Scott Feinberg who is the API architect of the New York Times,
and boy we just sure learned a lot about the New York Times and how it works under the
cover. Scott, thank you so much for taking the time today. Thank you for having me. You have been watching episode number 161
of CXOTalk. Thank you for joining us, thank you to Scott Feinberg from the New York times
for joining us, and everybody come back on Friday where we will be back again with another
awesome show. Thanks so much everybody. Bye bye.

Leave a Reply

Your email address will not be published. Required fields are marked *