Making Modern Web Content Discoverable for Search

[Music] good afternoon everyone I’m Tom Greenaway and I’m a developer advocate based in Google Sydney and I’m Martin Schmidt I work out of the Zurich office and I’m a webmasters chance analyst which is just a fancy name of being a developer advocate for search and web ecosystem today Maude and I are going to tackle this topic the best practices for ensuring a modern JavaScript powered website is indexable by search but what do we really mean by this sentence well by best practices we mean what every developer should know the techniques knowledge tools and approach and process and by modern JavaScript powered website I know it’s a bit of a mouthful it means like websites which use modern JavaScript frameworks for their front-end and probably are rendering their HTML in JavaScript on the client side and they’re typically like single page apps but this applies to any website that uses front-end JavaScript and then they might be say powered with Ajax or WebSockets for content and lastly what do we mean by indexable by search well I mean indexable by indexable I mean the content can be understood and by search we mean Google search but these rules they apply to other web crawlers too and such as say other search engines or social media services right like Facebook Twitter and all the other wonderful ones cool cool so now that we know what we want to address here how are we gonna split this up how are we gonna go through this well first things first I think it makes a lot of sense to quickly go over how Google search actually does the indexing and figures out what content there is in the web then we’re gonna look into something that as a web developer I really really wanted all these years and we we got that now the tools to help you all and us to debug the things that we are seeing and last but not least we don’t want to just debug things after they happen we want to make sure that we’re getting ahead of that and basically we’re going to talk about a few best practices for indexable search yeah indexing your content cool so with that out of the way it’s good that I have you here because I have got something to talk to you about okay that sounds a bit intense yeah don’t worry it’s not that intense it’s just I have how do I put this I have a friend who’s called Marvin let’s call him Marvin and they are building a single-page web app and they have done that and they have followed most of the things that you know you do these days like PWA and all that kind of stuff but they have issues getting users to find their content online I see and unfortunately that’s not an isolated issue right right so you ran out Twitter poll and other people are encountering this as well yeah yeah like I got a bunch of responses and like so how can they check if my stuff is findable in search or hey my stuffs not showing up but I don’t know why and I think that needs a little bit of a dive and an explanation I think right I guess that’s a good good idea to do well there are a lot of tools available for debugging on Google search but perhaps before we get into those however we just go over how Google search sees and indexes the web I think that’s a fair point that sounds good well here’s a basic diagram of how websites where traditionally indexed Googlebot are search crawler found pages downloaded them and process that content and then put them into an index and then performed more crawling based on the links it found yeah okay but what happens in the process step like that seems to be where the magic is right well when we fetched a web page from a URL and it was a traditional web site you know that web page was complete when it arrived and we call this like you know the rendering of the page right you know the the construction of the HDM right so when you say rendering you don’t mean stuff like putting the pixels onto the screen or like dealing with the Dom transitions and animations and stuff it’s just where does the HTML get construct like server-side rendering versus client-side or hybrid rendering exactly like traditionally websites where rendered entirely on the server and then any JavaScript that was used was probably just for cosmetic purposes right okay cool but now that we have figured out that this is about like the dealing with the constructed HTML it’s no longer that way right we have changed our web architectures quite substantially so what happens there today in the processing step yeah yeah it’s a good question um basically there’s now a render inside the process step so more specifically we have a version of chrome that opens the content of the page and run some JavaScript and then it spits out the final HTML but we also have a queue as well which is quite important and that kind of leads into this next point which john mueller and i revealed at i/o earlier this year which is that in a nutshell the rendering of javascript powered websites and google search is actually deferred until Googlebot has the resources available to process that content right deferred so what kind of timeline are we talking about what’s the delay well you know it could take minutes or maybe an hour or maybe even days or up to a week before the render is actually completed a week yeah yeah okay that’s Wow it’s on a shark I know but you have to understand that like the web is really big right it’s quite huge in fact we found over 130 trillion documents on the web so far ok that’s a mouthful and this number is two years old and I guess like the web’s growing so that ok I understand understand that and cool but is there is there anything that we could do to like help to crawl us a little bit I mean if I remember correctly when I attended the session at i/o you said something about dynamic rendering is that something that would come in here yeah exactly so dynamic rendering is a technique that allows us this sort of short circuit the rendering pipeline by delivering a server-side rendered version of your normally client-side rendered website by rendering that client-side JavaScript on the server with a mini client-side renderer for example a headless browser like puppeteer could be used oh right yeah that’s that’s pretty cool so how does that work in detail well here you can see how a serve identifies that the device requests in the page is a user browser and then it delivers well it serves a payload of HTML and JavaScript that gets rendered on the client right so that’s basic stuff but when a crawler like Googlebot makes a request the server sends a different payload and instead of sending the HTML javascript directly to the browser or in this case the crawler we send what is normally sent to the browser but we send it to the dynamic rendering service and we run it through that run the payload through that service so then the dynamic renderer then spits out completely statically rendered HTML payload for the crawlers huh that’s pretty smart okay and to be clear that dynamic renderer could be an external service or it could be like running on the same web server infrastructure all right yeah that makes sense I guess for this kind of stuff you can use tools such as I guess puppeteer is one that you mentioned already yeah you could also probably use to render tron which is the higher level of abstraction right puppeteer is basically an NPM module that you install and that kind of remote controls or well programmatically controls a chrome instance that runs headless Lee which is great but I like something more high-level and I think render Tron steps in there we basically just have the server running which is a wrench on server that uses puppeteer to steer that and you give it a URL to render and you get the render static HTML back that’s pretty cool ok she’s fantastic I guess you can also deploy that pretty easily I think like there’s this thing called Google cloud platform that’s probably pretty easy later but I guess you can also deploy it pretty much anywhere else right do you have any example I do have an example actually so there’s a NPM module that is called the render tron middleware so if you are using let’s say for instance Express yes you can use that as the middleware but what you’re doing here is basically you just first step you require it you need that into your you need to get that into your project and then what you’re doing is you basically configure it to do the right thing for you in this case we want to specifically jump the the rendering pipeline of the Google bot render Tron by default doesn’t Dupree rendering for Googlebot because Googlebot does run JavaScript but we want to get the advantage here anyways so we can just use the pre-configured and and pre you know build list of render agents that they are rendering for and add Google bot in there and then once we have that configuration ready and have imported it we can go on to actually use it in our application middleware stack so we can basically point it to the running server somewhere and say for all these user agent patterns pre-render by the way now that I have got you here because you never respond to emails timely which is fine I need I neither do I no offense and I mean chrome dev summit is a big event so this rendering does cost a bit of resources so I’m wondering is there any way to like figure out what Google like what does what does reggaeton really do to figure out what’s Google’s how can I verify that is really is Google button are just saying like pretending to yeah well the easiest way is to user agent sniffing for the Googlebot string is an example with the mobile user agent for Googlebot but you might uh you know you might want to do this for other services well as well that you want to serve pre-rendered content to like social media services and also for Googlebot you can do additionally a reverse DNS lookup to confirm that it really is coming from the Google service and like I said this is the mobile user agent so you can just detect the desktop user agent as well and that URL will give you a list of all the different cool like user agents right that’s nice which reminds me actually I should sync with John Mila and check if there are new tools in search console for this stuff or or maybe you can tell me like why do you why do you bring John into this come on sorry I’m here and right like literally a meter away from you or I don’t know how many miles of foot or inches that is right no idea but basically we have a bunch of stuff for you and I’d like to walk you through that so you know the Google mobile-friendly test already yeah so you know this is kind of useful it shows you if your page is mobile-friendly it does give you a screenshot for what we are seeing in Googlebot and it’s pretty easy to use you just paste your URL in there but it does more than just that because it also gives you this which is what I always wanted to have when Googlebot does not render what you expect it to render you get the JavaScript block messages that you would give them or get them from the chrome dev so can really debug the jobs and really do it like this and you know here’s my favorite I mean we had this is it undefined question earlier on apparently it is undefined and undefined is not a function which is unfortunate but that you know that happens also do you know about the new URL inspection tool that we’ve got in search console I don’t think can you remind me yeah sure so if you have verified for property in search console you can basically pass or paste any of the URLs that are belonging to that property into the search console URL inspection tool and you get like when we crawled it if it’s on google or not and you have like a bunch of information and you can run a life test as well so this this blog post here is drumroll not on sun on the index but that’s that’s fine whatever we also have something else it’s actually a pre announcement that we’re gonna make at chrome dev summit now so can we get a like little bit of the drumroll so we’ll get the live code editing feature in search console that’s it fair enough I think yeah okay so imagine you were building a website let’s say there’s an after party today so you build the website for the chrome dev summit and you want to check if your structured data works to get highlighted in search results right you want to check that as quickly as possible yeah that makes sense I want to be able to iterate quickly I want to have to wait for deployments music yeah exactly I want a development cycle that makes more sense so what you can do is you can plot that into this wonderful tool and here we have an example we are using javascript to create a script tag that contains the structured data and we have all our wonderful structured data for the event here and then we can click on the little test code button and what it gives us is this and we’re like yay our event is available and this is a code editor over here and what we see over there is it’s missing the performer for the after party oh wait that’s a shame I think it’s meant to be like the chrome diner it’s the chrome diner so we should add this performer so what we can do is we can go straight back into the code editor and click the button again and in life updates as we have retested our code so we can do all this in the browser in the single tool and I think that is pretty fantastic really all right that’s pretty good stuff okay but yeah that is definitely a neat way of testing and trying our code out like rain quickly that’s something that like search console generally tries to do right hmm so you have this really nice flow so let’s say like someone in your company or agency or wherever access to search console you I don’t have access to search console as a developer normally because you know I have so much other things to worry about really and then someone finds an issue through one of the reports so how do they get this information to me well one way of doing so is basically they can just go and see this issue from the reports page where there’s like a bunch of samples in this case the content is wider than the screen and in the second stage of this they can basically just say alright so our developers might be an external developer we don’t want to give them access to all the data so we share this particular issue with them and they get a link they don’t have to sign in or anything they can just like use the link that is shared here to see what the issue is get access to the documentation it explains what the problem is and how to fix it mm-hmm and then last but not least when I as a developer then go like I fixed this I have this under control we know that that’s often not true so what search console offers us is this validate fix button so if I’m like Tom I gotcha I fixed this you or I can press this button and go like I got you in ten minutes yeah right okay so really establishes a flow and it does it is a really nice workflow and works across departments which i think is pretty fantastic as well yeah but actually there is another addition to our dev tools talking to me about that yeah exactly you know I’m sure everyone obviously knows about lighthouse they’ve been to the foreign space and have seen that yeah I’m awesome statue we’ve got there it’s fantastic yeah what if I told you that there are actually SEO audits inside of lighthouse we’ve got a few more coming soon so basically this can like automatically detect things like whether your HTTP header responses are accurate or not and meta tags as well like they’ve got correct title and description tags or you’ve got a trifling setup correctly if you using that for localization why not and also like descriptive link its link text even for anchors so you know like exactly what I avoid click here because that doesn’t really communicate what is actually the thing that you’re linking to right number five is going to surprise you and then they like robots.txt and then the new features were adding a like automatic detection of like the size of tap targets and a margin around tap targets to make sure that they’re you know nice for uses and also structured data testing as well which you’re just talking yeah yeah it doesn’t gonna get more of that that’s fantastic that’s very good good to see ya cool alright so I think from the tool side we have like lighthouse audison and si for SEO we have the search console we have the mobile-friendly test and like the rich results test with editing features I think we’re pretty good on that front but do you have any recommendations in terms of best practices that I should tell my friend Marvin yeah my friend I’m serious like this is a friend of Mines totally not me any best practices that we should recommend to them yeah yeah let’s go through a few okay cool well firstly it’s important to know like remember how I said that Googlebot is running like chrome nowadays that is fantastic finally we have a modern browser well actually wait a second it is chrome but it’s not actually the latest version it actually runs chrome 41 right 41 it’s not even 42 the answer to life the universe and everything it’s just 40 not not not quite there yet but you know Chrome’s working on it but seriously though since it’s chrome 41 and chrome 41 was released in 2015 doesn’t support all the latest features of modern browsers so for example it doesn’t actually support es6 so the latest language features aren’t available and while it has web components it’s actually version zero of the spec and another thing that to note is that it’s actually stateless which I’ll explain a little bit okay yeah it’s nice I’ll break it down for example with this code how many es6 features can you spot whew it’s a good answer yeah okay more than zero yeah exactly we now we might forget that some of these are relatively recent features like advanced object literals or cons definitions back quotes and variable substitution yeah yeah exactly yeah so one way to deal with this you know if you need a solution is to use something like babel which is allows you to transpile es6 code down to es5 automatically and you can easily compile a set of files or directory and you know compile into like a single file for serving and using presets you can also detect a minimum like browser version use as a baseline so you can ensure that like es6 features and es6 code go to browsers i can’t support it and then browsers I can’t get the es5 trialed code that makes sense okay and now while chrome 41 does have web components it’s actually an older version of the spec they probably used too so after chrome 41 shipped several features such as custom elements and shadow Dom we actually had some changes made to their specs so depending on exactly which features are you use in version zero it might be very simple to migrate or might get more complicated but the most important thing is to be aware that there are there’s differences and lastly this body shouldn’t come as too much of a surprise but you know Googlebot it basically doesn’t really have any memory and what I mean by that is that every time an access is a web page or a side origin it just always acts it like it’s the first time it’s ever encountered that website all right we achieve that we turn off a bunch of things so we don’t have service workers running so we don’t have like a serviceworker cache and we don’t have local storage or session storage and so forth so makes make sense if you click on a search result that’s like you coming to that page the first time so we what is the you know first time user experience true encounter it makes sense so man do you have any suggestions for how we can like substitute for some of these things so I think if you look at things like you know the web components and a few of these features like intersection observe and all these things I guess polyfills is a good way once you’ve already done your homework and did like the progressive enhancement or graceful dedication at least but the problem is polyfills I feel at least is that there’s a a bit of a risk to like ship dead code to people like give them a bunch of stuff over the network which depending on where they are and what plan they’re on might be actually pretty expensive and time-consuming so you want to reduce that and actually I really like this library called poly folder i/o so polyfill that io basically sniffs on the user as well or the user agent of the user on the user agent of the person requesting other browser requesting it so that if the crawler comes by and there’s like oh this is a Googlebot crawl or chrome 41 so I give them a bunch of stuff that the normal user on a more recent Chrome or Firefox or edge or whatever does neat so basically it tries to give you the right amount of code that you need to make this work so that’s pretty fantastic I think it’s kind of cool yeah is there is there any place where I can figure out more about these like feature issues where like features are missing in chrome 41 that are there in the modern one yeah definitely if you check out can I you use comm it’s great right also this for this sort of thing because he checked like the features across various browsers and you also specify specific browser versions you can say hey what’s different in chrome 41 specifically cool that’s pretty nice and also like the golden rule of any kind of index ability and building website for search crawlers and that kind of thing is to just make sure you test really frequently are fair enough that’s a test test test fair enough cool so going back to dynamic rendering for just a second and kind of the tools discussion so if I if I test my stuff and if I figure out like oh no this feature is like really tricky to work around and I want to not change my code I guess I can use dynamic rendering for that but I guess there’s like trade-offs in there’s like situations where you shouldn’t be doing that so what are the sites that should generally speaking look into dynamic rendering well because the rendering queue can introduce some delay if your site has lots of frequently changing content like a news publisher or something like that and you’ve got maybe breaking news and their articles that are coming out changing you know very frequently you probably want might want to look into using dynamic rendering to overcome those delays and if your pages use features that aren’t available in chrome 41 and it’s not possible to work around those limitations maybe in the short term then using dynamic rendering as a useful work around until either you you know Googlebot catches up or you have the time to adapt that your own implementation right that makes sense and also while Googlebot is supporting JavaScript other crawlers might not so for example if your site is using social media interactions a lot those crawlers tend to not run JavaScript so if I like share a link and I’m sharing it on social media but I want like a nice preview card to be created it probably wants to try and access like the image and the title and the meta description or something like that so if you’re actually using client-side rendering for those things then you might get like just the variable templates you know in the concert image in it a preview which would be bad so in order to get better representations there you can also serve a dynamically rendered version cool page sounds pretty good but ultimately you know dynamic rendering it’s a powerful technique but it’s still a workaround so around modern do you have any ideas on if we’ve got plans to improve this on the Google side huh yeah way to put me on the spot so um I don’t like to make predictions on that kind of stuff but definitely so that there is the way that our infrastructure is set up there is a bit of a gap between when we actually execute the JavaScript and then I do the rendering and then the indexing bits and the crawling so we try to bring these closer together which is an interesting technical challenge and then at the same time we’re trying to catch up with chrome but we don’t want to just catch up because then we’re gonna have the same freaking conversation a few years like oh yeah chromis persian running version 70 and I was like oh when does it get into three-digit land so we had basically work on a process that we’re hopefully going to start very soon so I make no promises on when but we are working on figuring out a process to stay up to date with chrome so that Googlebot does run with the chrome release schedule so that we are basically giving you an evergreen yeah right that would be fantastic if we could do that yeah and also the short yeah exactly that’s all that goes hand-in-hand really because we have to touch the rendering infrastructure anyway so we might as well do both things but I think we might get an update quicker than we get the both the two things together but okay cool we’ll see yeah cool all right so thank you so much for all the talks about like how the indexing works and bits and pieces that was pretty fantastic a bunch of new stuff yeah and I hope this helps with Marvin as well Marvin is gonna love what I’m gonna tell him I’m sure okay yeah well thanks for showing me all those search console tools as well oh yeah yeah and if developers need more support and help and like Marvin like can I get so I try to get Marvin so they go to our documentation because the documentation is expanding quite quickly and we have a bunch of cool documentation coming up and good documentation already there we have webmaster hangouts so you can pop by and talk to us over a hangout and ask us questions there as well they are recorded so if you want to go back to one that has happened and there was an interesting question you can find it there as well we have what’s called the webmaster forum a bunch of fantastic people are there called product experts we’re also helping if there’s anything coming up with search and last but not least if you’ve been to the forum space before we have a search console booth where you can try out search console and get stickers and stuff so definitely check that one out tomorrow so yeah that’s a that’s a bunch of this awesome yeah cool well yeah thanks everyone thank you very much for staying with us for long [Music]

Read More: Learn from these inspiring successful internet marketing stories.