English Google Webmaster Central office-hours hangout

JOHN MUELLER: All right Welcome, everyone, to today’s Google Webmaster Central Office Hours Hangout My name is John Mueller I’m a Webmaster Trends Analyst here at Google Switzerland And part of my job is to talk to Webmasters like you guys, and make sure that the information that we have goes in your direction, and feedback from all of you comes back to our teams as well So we have a bunch of questions that were already submitted If one of you wants to start, feel free to go ahead and grab a question MALE SPEAKER: Hey, John JOHN MUELLER: Hi MALE SPEAKER: Hi I had a question and a suggestion, actually The question is the following So at cardhub.com, we got hit by the latest Panda update And we are literally clueless on where the thin content may exist on our site, especially because the past year and a half, we have cleaned up a lot And we are regarded, in the credit card space, as kind of the authority in terms of credit card content And so I was wondering whether, either now or offline, you may be willing to– I know Barry had some similar concerns on his blog– but whether you may be willing to point out some areas that may be considered thin content by the algorithm JOHN MUELLER: OK I’d have to take a look at your site in detail to say much more But I can take a quick look afterwards Or if you have a thread in the Help forums, you can send me that link That would be useful to me MALE SPEAKER: OK Great And then the suggestion is– and again, I don’t think it’s linked to Panda But leading up to Panda, there were a few websites that literally scraped every single page of our website And so our Google Webmaster tools got filled, and continues to get filled, by junk Essentially, these websites have thousands of pages, thousands of links pointing to our website that are 404s On the Google Webmaster tools, there’s just not a good way to clean up that and make it stop reporting, so that we can see the good things and not every day get a bunch of spam JOHN MUELLER: Well, where do you see that in Webmaster Tools? MALE SPEAKER: on the 404s JOHN MUELLER: On the 404 pages So it’s basically linking to pages that don’t exist on your website anymore? MALE SPEAKER: They never existed The way they scrape the content– their link references for some reason got messed up So they literally have thousands of pages linking to us, to 404 pages that never existed And even some websites, we have contacted the host providers, and we have brought down the entire website One of them actually was hosted by Appspot And you guys were quick to take it down But the 404 notifications just keep coming in JOHN MUELLER: OK So there are two things probably worth looking at there On the one hand, 404s don’t cause any problems for your website It’s not that this would be negative for your website, in general You might have seen that before So that’s at least good to know The other thing is in Webmaster Tools, the 404s that we show are prioritized by their impact So if the top 404s in Webmaster Tools are really random 404s that you don’t care about, that means we don’t have anything important that we can show you We found these 404s while crawling, which is kind of a technical thing that we do But it’s not that we found anything more important that you need to fix with those 404s So that’s a good sign there, too MALE SPEAKER: No, absolutely My suggestion was that it is a little bit frustrating when using the user interface You essentially have a lot of junk coming in And if there was a way, in the interface, to communicate and say, hey, don’t show me 404s from this domain, I don’t care where they’re linking, I have disavowed them, et cetera Or some other way, when it mark– maybe another way to market that fix That could be helpful And that was where the suggestion was coming from JOHN MUELLER: OK That’s a good idea, yeah MALE SPEAKER: Hey, John While we’re talking about Panda, can I squeeze in one other Panda question? JOHN MUELLER: Sure MALE SPEAKER: So HubPages– we recently

brought in a ton of content from Squidoo, where in early September, we started bringing over content from Squidoo And it was finished by a few days before Panda 4.1 rolled out And we’re really trying to figure out where– we’ve been working hard to improve the site Could all that content that we brought over in that short period of time impacted Panda for us? JOHN MUELLER: So just by having a lot of new content, that wouldn’t be a problem So that, by itself, wouldn’t be an issue So it’s definitely not a matter of a technical issue, where you’re importing a lot of content, where you’re redirecting a lot of content to your site That’s something that our system should be able to handle That shouldn’t be a problem from our point of view With regards to the quality of the content, I guess that’s harder for me to say, because it really depends on the old and the new content and how that works well within your website If you’ve been making significant changes right around when this was happening, it might just be that this is like a fluctuation that’s happening in between And when things settle down– I’m assuming things are in a stable state now– that’s something that should be reflected in the search results as well So you’ll probably see some changes as that settles down and as the new data is updated again MALE SPEAKER: continue just to clean up what we were working on and then wait for our next refresh? JOHN MUELLER: Yeah The whole quality issue is something I’d always continue working on, and trying to find ways to recognize really high-quality content, and to recognize lower quality content that you could, for instance, no index If you want to keep it on your site, that’s fine If people still navigate within your website to find that content, that’s fine, too But if you can recognize that it’s lower quality, maybe putting it on a no index helps us to focus on the higher quality content within your website MALE SPEAKER: OK OK And nothing changed in general from 4.1 to how sub-domains are treated as separate sites? JOHN MUELLER: Nothing that I’m aware of there, no But that’s something where if we can recognize that these sub-domains belong to the main site, we’ll treat them more like a main site If we can recognize that they’re essentially independent websites, we’ll treat them like independent websites And a lot of cases where you have user-generated content with sub-domains with individual users or different, let’s say– I don’t know, like, topical areas, that’s something where we’d say that’s probably more like a separate site than just one site I guess topical areas is kind of borderline there But especially when you’re looking at something with different users, that would be a good reason for us to treat those as separate sites For instance, on Blogspot, everyone has a different sub-domain And we don’t say this is all one big website, Blogspot This is essentially a lot of different websites MALE SPEAKER: John? Can I? JOHN MUELLER: Sure MALE SPEAKER: Thank you John, we have been pharma-hacked We cleaned the site, I think Never know But now we are looking with another step Results have cached clean But the snippet is still compromised Can we still consider the site clean, or there is something else? JOHN MUELLER: I would use the [? Defections ?] Google tool to try to test to see what your pages look like now So to some extent, you’ll see that directly when you visit those pages But sometimes these hacked pages cloaked [INAUDIBLE] MALE SPEAKER: I [? fetched ?] Google, and the pages are clean But my problem now is snippet The snippet on page results can be different even if the page cache is clean and [? Fetch ?] as Google is clean Can snippets, can be still old and compromised? JOHN MUELLER: Sometimes it happens that the snippet is a little bit delayed from the rest of the site MALE SPEAKER: OK JOHN MUELLER: So that might be happening there But it shouldn’t be something that lasts, let’s say, longer than a week If it’s older than a week and you’re still seeing the hacked content in the snippet, then that might be something to send to us to take a look at

MALE SPEAKER: OK JOHN MUELLER: If it’s just that these pages still rank for the hacked content, then that’s kind of normal, where we’ve seen this hacked content on your pages And if you do a site query and add maybe Viagra to the site query, then maybe we’ll still show those pages, because it used to be relevant there But that’s not something you’d need to worry about MALE SPEAKER: OK Thank you very much Thank you Bye-bye MALE SPEAKER: Early in this broadcast– whatever you call it– you said, basically, that the 404 pages are ordered in importance Did you say that? JOHN MUELLER: Yes MALE SPEAKER: They are Is that the first time you’re saying something like that? You don’t know Or are you [INAUDIBLE] on the chat? JOHN MUELLER: I think we announced that when we announced the feature in Webmaster Tools, how many years ago I think you were [INAUDIBLE] Sorry? OK, you’re muted now [LAUGHS] So we show them in Webmaster Tools, the 404 pages And we sort them by priority, by default So that’s something where you’d see that there And I’m just equating priority with importance, where if the top 404 pages that you see in Webmaster Tools are really random pages that never existed on your site that you don’t really care about, then that’s a sign that we haven’t found anything more important And the priority– I think we mentioned some of the factors there in the blog posts when we announced it That includes things like, is this URL in your site map [INAUDIBLE]? Is this something where you’re seeing search traffic to that URL, or where you were seeing search traffic? Those are the kind of things we’d look at for the priority So essentially, if you’re telling us that this is actually an important page and it returns a 404, then that’s a sign that we should put it higher up on the list MALE SPEAKER: Thank you JOHN MUELLER: All right Let’s grab some of the questions from the Q&A Any news about the Penguin update– potential release date, new factors, et cetera? At the moment, I don’t have anything to announce with regards to Penguin I know the engineers are working on something that should be available fairly soon But I don’t have any specific time frame for when that might happen So not today, not tomorrow But I’m guessing fairly soon Certainly by the end of the year is my current guess And a lot of the time frames, with regards to new algorithms or features or bug fixes, those kind of things, is always a bit tricky when you’re looking at something the size of Google’s web search Because on the one hand, we have to make all of these changes, and run these algorithms, and go through the test data to make sure that we’re getting things right On the other hand, we also have to review the data that’s generated there and make sure that it’s actually useful data that’s really providing useful additional information in search and not just something that we’re essentially running, but doesn’t really have any [INAUDIBLE] value in there So that’s something where when we’re somewhere along in this process, maybe we’re working on the algorithm, or maybe refining the data that’s used to create the algorithm, or that’s created by the algorithm there Then that’s something where it’s really hard to estimate when it’ll actually be ready and when it will be live So if everything goes well, usually, that’s a process that’s quite a bit faster If something in between goes wrong where we say, oh, we have to rethink what we’re actually doing here, then maybe it will take a week longer Maybe it will take a month longer It’s really hard to say that BARUCH LABUNSKI: The problem is you’re hearing a lot of disingenuous Webmasters, and people are saying, yeah, it’s already here, it’s there So in this Hangout today, is there a way you could just be a bit more transparent? People are just saying Penguin is already out without– JOHN MUELLER: This is something where we do run tests from time to time to see how things can react, where we do live tests We do that a lot with features in web search, for example That’s one place where that’s particularly visible, where we’ll do maybe a test of 1% of the traffic, or maybe a test of, I don’t know, 3% or 4% or 5% of the traffic with our updated data, with our updated algorithm, and see if that really brings out metrics that show that this is actually a good change or not So those are the type of things I would expect people to see from time to time Whether it’s specifically from this Penguin algorithm or not is kind of uncertain, because if you’re not

looking at specific UI elements in search, you don’t really see what specifically has changed there BARUCH LABUNSKI: OK All right, John MALE SPEAKER: So just to be clear, Gary said last week that if you disavowed anything from the past two weeks or so forward, those disavowals will not be included in the next-coming Penguin algorithm He said that clearly I don’t know if you heard it or not, but– JOHN MUELLER: I wasn’t there Yeah [LAUGHS] I don’t know It’s something where I imagine you’re currently right about at the edge where the engineers will say, well, we have to take the data from somewhere, take a cut, and work with that So it’s quite possible that we reached that point or gone past that point But I wouldn’t say that it’s useless to disavow things now, because there are always changes happening in search And if you know a bad link’s pointing at your site, that may be a previous SEO built that maybe you built accidentally that you weren’t really aware of them being bad at the time Then that’s something you can always disavow now even if maybe the current Penguin algorithm has already run So I wouldn’t worry about a cut-off time and say, I’m not going to touch anything now MALE SPEAKER: Right Because he did say it’s going to run faster, so obviously, the next time they run it, those will be counted Two is, do you know who at Google is going to go ahead and confirm this on the Google+ page, so we can keep refreshing constantly? JOHN MUELLER: I don’t know BARUCH LABUNSKI: Could be John JOHN MUELLER: I’ll definitely re-share whatever we announce, yeah These are the kind of things– we know you guys are waiting for this We know that a lot of people have put in a lot of effort to actually clean up these issues There are probably still a lot of sites I’ve never bothered cleaning up that have continued doing their spammy things And that’s something where they probably won’t see those changes But at least I know there are a lot of really well-intentioned Webmasters out there who noticed this was a problem, saw the issue coming up, and worked to clean that up So we’re happy to get the word out when it does come out BARUCH LABUNSKI: But even after cleaning Penguin, that doesn’t mean that the site will still come back to life There’s so much other stuff that the Webmaster should have looked at, right? JOHN MUELLER: Oh, definitely Yeah You can never really say that my site was ranking number two one year ago and this algorithm brought my site down to number ten Therefore, if this algorithm were reverted, my site would be back at number two There are so many things that happen in web search over the course of a year that you can’t really expect things to be exactly the same if you’ve cleaned up this problem And the other thing to keep in mind is maybe your site was unnaturally ranking higher at that point So if there were, for example, a lot of unnatural bad links, and we had incorrectly counted those for your site at that time, it might have ranked higher than now Where if you’ve removed all of those bad links, and we essentially have the current state So that’s something also worth keeping in mind But I know there are a lot of people out there that have spent a lot of time to clean things up and really improve their websites across the board So I’m hoping that those kind of sites will be seeing a nice job as well But as always, we have a lot of different algorithms And to some extent, if you’ve been working to clean up your site, you should have been seeing some of those positive changes, as well, in the meantime But I know this is a tough topic So I’m happy to keep you guys informed about this But I’m also happy to look at some of the other questions that we have in our Hangout So let’s take a look what else we have here MALE SPEAKER: John, can I just quickly chime in and ask you one thing about that? JOHN MUELLER: Sure MALE SPEAKER: I put a link at the very top of the chat And I also sent it to you in an email just recently And it’s a spreadsheet with a thousand– well, probably not a thousand, but a couple hundred links, probably, in it, or something like that And it shows lots of PR6 sites selling– very, very spammy sites– selling links And they have been I reported it over a year ago I’ve sent it to you a few times– random examples But this is a full list I spent a long time doing And I’m concerned as to why there are so many PR6 sites that are really, really spammy, getting away with it And forget about the people who are buying from them I’m wondering why this isn’t being dealt with Has it been dealt with, and we’re just still seeing the PR on there? But I’m still seeing those sites ranking very well And they look like examples that you always say might have slipped through the cracks

And I can’t understand, for the life of me, why it’s taken a year, with multiple requests, to have them removed, and nothing’s– well, not removed, but to be dealt with It seems like a serious concern They are really spam BARUCH LABUNSKI: Are they [INAUDIBLE] junkie sites? MALE SPEAKER: They’re junk sites that have been selling PR6s, loads of them JOHN MUELLER: There are a few things where we do take action that you might not see directly On the one hand, page rank is something that we haven’t updated for, I think, over a year now And we’re probably not going to be updating it going forward, at least in the toolbar page ranks So that’s something where it’s really hard to take that information and work on that And we have a lot of ways to recognize these problematic links and to treat the sites that are essentially selling those kind of links in a way that essentially blocks the patron from passing from those sites anyway But I didn’t get your email recently You mentioned it, I think, on Google+, but I didn’t get anything But I’ll definitely take this list and go through it with the Web Spam team to see if there’s anything there that our algorithms or the Web Spam team at the moment has missed out on MALE SPEAKER: OK Yeah, I’d appreciate that because if I’ve got it completely wrong, I’d rather know that and not bother you or waste my time looking to help in that kind of way JOHN MUELLER: Yeah It’s something where we’ve also internally discussed how we can help make it easier for people who are reporting web spam issues so that they, on the one hand, understand what kind of issues we are actually taking action on And on the other hand, we can also recognize these people and say, well, this guy is really good at reporting web spam We should take his reports a little bit seriously And that’s something where, at the moment, we don’t really have the kind of infrastructure to do that automatically But I know the Web Spam team is looking into that to see what we can do to take these web spam reports a little bit easier and a little bit– such that we can take action on them a little bit faster and a little bit more visibly maybe so that at least the person who’s reporting them knows this was useful, or this was essentially irrelevant for us MALE SPEAKER: Sure What we can say for sure is that they’re selling Do Follow links That’s for certain Whether or not they’ve got any power, sure, that’s something we haven’t got a clue But I appreciate that Thanks, John JOHN MUELLER: Sure All right [? Yoast ?] recommends that all affiliate links be masked He recommends that we have a separate folder for affiliate links blocked by robots.txt Is this OK to do so? If affiliate links are already No Follow, should we redirect them either way? Essentially, from our point of view, we want to affiliate links not to pass page rank So if you’re blocking them with a No Follow, that’s fine If you want to block them with a robots.txt file, that’s fine, too That’s not something that we’d say you need to do But if you can’t, for example, use No Follow because of your CMS, then maybe going through a roboted redirecting script is a good idea But essentially, the main point from our side is, they shouldn’t be passing page rank And you can do that by robots.txt, or you could do that by No Follow MALE SPEAKER: From that point, John, amazon.com, their affiliate program, are all links pointing directly to their website with a URL parameter Does that mean that for a site that big, they’re not getting any credit for any link pointing to them with an affiliate tag? JOHN MUELLER: We do try to recognize the domain affiliate’s systems ourselves And we take action on those links directly as well So that’s something where if we can recognize that everyone is doing it wrong, and everyone is linking maybe without the No Follow for these specific programs, we’ll do that transparently on our side So they wouldn’t be penalized for those links It’s also not the case that we ignore all links to Amazon because of that But we try to recognize the ones that are actually affiliate links And like you said, with an affiliate parameter, that’s something that we can recognize and follow up on there And this isn’t something where we’d say we’re searching for these sites and trying to penalize them We’re not trying to penalize affiliates in any way You can have a fantastic affiliate-based website, and that’s perfectly fine We just want to make sure that if you’re an affiliate-based website, that you actually have some unique and compelling content of your own, and that the value of your website is with the content on your website, not with this affiliate link, essentially MALE SPEAKER: Got it JOHN MUELLER: “If Web Passage will

show that we have 404 pages, what should we do? Should all 404s be redirected to the home page, or might this be a bad thing to do?” Essentially, if you have 404 pages, I’d take a look at the list in Web Passage tools and definitely look at the top ones there And if you can tell that these are all really random 404s, then that’s a sign that we haven’t found anything really important as a 404 on your site On the other hand, if you can see that within this top list, we have URLs that you actually want to have indexed that maybe accidentally returned a 404, or that you accidentally removed, and you didn’t realize that you had deleted these pages instead of renaming them or moved them around, then that’s something I’d recommend fixing And having a 404 on pages that you have removed that you don’t want to have indexed is completely fine Or having a 404 on a URL that never existed before, then that’s perfectly fine, too That’s not something that you have to mask or hide with redirects The 404 is a perfectly normal technical tool to use on a website “Can I have two separate websites with two separate locations, use almost identical content? It’s legitimate content, not spam One business located in one state and another one in another state How can I be sure Google won’t penalize these sites?” From our point of view, duplicate content is primarily a technical problem in the sense that when someone is searching for that content that we’re sharing across those two websites, we have to pick one of those URLs and show it in search We won’t show both of them if we think they’re essentially duplicate So we pick one of those, show it in search And that would be the one that would be visible there for that specific query for that specific set of content So it’s not the case that either one of these would be penalized, but we just wouldn’t be showing both of them at the same time in search And sometimes there are completely legitimate reasons to have multiple websites like this That’s not something where I’d really worry about there What I would try to keep in mind there is that you want to limit this to a reasonable number of websites A handful of pages, a handful of different locations is fine If you have a business that’s active in every city in the country, then I wouldn’t create different websites for every city We kind of see that more as being spam So if you’re essentially just randomly creating these pages and just trying to stuff them into the search results, that would be kind of spammy If these are really two legitimate businesses that you’re running that happen to share some of the content because they’re essentially led by the same business, then that’s fine “I’ve been developing my current domain for approximately five years It has a dot net extension I was recently able to purchase the dot com and dot org versions of the same domain The dot com is 13-year domain age What should I do? Redirect my links?” So essentially, I’d just pick one of those domains that you want to keep using and redirect your content there So if you want to move to your dot com because you think that’s a better choice of a domain, that’s fine That’s something you can do If you want to stick to your dot net and just redirect the other two domains to your dot net version, that’s fine, too One thing to keep in mind is every time you move from one domain to another, you’ll have a certain period of fluctuations as things kind of fluctuate, move around And it takes a certain amount of time for that to settle down So if you want to move to a different domain, maybe do that at a time when your business, seasonally, is not as active as it otherwise might be during the year And another thing to keep in mind is that when you do move it to a different domain, there are some things that kind of get lost or kind of diminish there, in the sense that we will try to pass along as many of the signals as we can But some of those are tied to your old domain And we don’t pass those along So it’s kind of normal to see a tiny drop when you’re moving from one domain to another, but it’s probably not something that you’d really be able to track So I wouldn’t just randomly move around your domains just because you happen to have them I’d really think about where you want to be in the long run and make sure that you’re focused there for the long run With regards to the age of these domains, that’s not something I’d worry about there That’s not something that our algorithms would be taking into account specifically there So I wouldn’t say just because one domain is

older than the other one, you shouldn’t use that one But you might think about this from a branding point of view, where you say dot com is more recognized than dot net So maybe I’ll move to dot com But that’s not something I’d say you’d need to do from a CO point of view BARUCH LABUNSKI: But according to your patents, there is something where a domain is old, so you take that into consideration, no? JOHN MUELLER: I mean, one thing that kind of happens is if you have a website running for a really long time, then you will have collected a lot of signals over those years So that’s something that we kind of keep there On the other hand, if you’re moving to a different domain, then we try to recognize that type of move, and we treat it as a site move We don’t say, well, everything that’s been collected from this old domain is therefore valid for this new domain that’s actually moving there So that’s the kind of situation where naturally, you’d collect things over time But that doesn’t mean that you can just combine them with random other domains and say, oh, well, now my site is a combined age of 25 years, and all those signals apply to my site just because they’ve been there So I wouldn’t necessarily just move to an older domain just because it’s older MALE SPEAKER: Hey, John JOHN MUELLER: Hi MALE SPEAKER: I wanted to show you an example, and you can tell me if it’s a good thing or a bad thing Can you go to this URL? JOHN MUELLER: OK I hope it’s not a bad thing MALE SPEAKER: So if you look at the cached page that Google has– so this is one of the scrapers I was talking about So they have scraped our content And if you click on the cached page, Google says, “this is Google’s cache of cardhub.com,” dah, dah, dah, dah Well, I didn’t ask for the cached page of cardhub.com, I asked for a cached page of that spammy site JOHN MUELLER: Yeah So essentially, what’s happening there is we are recognizing that this site is scraping your site And we’re saying, well, this is a scraped copy of this other site that we know about And the other page is, essentially, the one that we choose to index So we’ll say, we’ve indexed this URL, we have some content here But actually, the content is the same as this main site, so we’ll show that on the cached page as well So that’s actually kind of a sign that we’re following this scraper along, but we’re essentially focusing on your content and not on their content So with a site query– MALE SPEAKER: From a [INAUDIBLE] standpoint, wouldn’t it be much better to either not index that site at all? I mean, what’s the benefit of indexing a site that you already know is a clear scraper and is spam? And the second thing is, isn’t it, from a user’s standpoint again, a little bit confusing? I’m looking at the cached page– again, I don’t think any users do that, because I don’t think you go show them in your search results But maybe there are some edge cases where I’m looking for a cached page of this, and I’m looking at a different URL? JOHN MUELLER: Yeah It’s always a bit of a tricky situation in a case like this where you’re specifically doing a site query of one site And we recognize that the content is duplicate, that this is something that we’ve indexed under multiple URLs But you’re specifically asking for a URL from this site So that’s something where we say, well, we know this content is actually the main content on CardHub, or wherever And we’ve also got a URL here from this site where this user is specifically asking about, is essentially the same So kind of what we’re doing is trying to show you what you’re asking for So if you specifically asked for this content, we’ll show you it to you But that doesn’t mean we’d be showing it in the search That doesn’t mean that we’d be splitting any value upon those sites This is, essentially, just trying to do what the user asked us And in this case, it’s confusing MALE SPEAKER: And is it fair to say that this is a classic example that you should disavow? Or is something like that you wouldn’t bother with, because obviously, we didn’t set up the spammy site? JOHN MUELLER: I wouldn’t worry about this one, actually This is something where if this is a clear scraper site, then we’ll pick up on it When we picked up on this URL specifically, we noticed that it’s actually your content I don’t think there’s anything that you need to do there With regards to disavow, I don’t think you’d need to do this anyway there, because these aren’t unnatural [INAUDIBLE] links This is essentially just a copy of your content MALE SPEAKER: So just to make it clear, you’re saying we shouldn’t disavow things that we have nothing to do with? JOHN MUELLER: I think if you’ve found something that you don’t want to be associated with, putting it in a disavow file is absolutely fine

It won’t cause any problems, and maybe [INAUDIBLE] MALE SPEAKER: [INAUDIBLE] something to do with the scraper, right? JOHN MUELLER: No, no The disavow doesn’t mean that you have anything to do with it It’s not an admission of guilt or anything It’s just you don’t want to be associated with these links Fine That’s something you can choose MALE SPEAKER: OK Cool OK, thank you JOSH: Hey, John How you doing? JOHN MUELLER: Hi, Josh JOSH: How is it in Switzerland today? JOHN MUELLER: Nice JOSH: It looks nice in the reflection behind you I have a quick question for you there User experience seems to be important for Panda But I’m slightly concerned Is it possible that competitors– now, this is a little far fetched But is it possible that competitors could hire people in some form to go around your website, and pretend like it has a bad user experience, and bounce around, and stuff like that? I don’t know what they would do But bounce around, go back to Google Is it possible for them to do that? And if it is, how would we combat that? I guess, theoretically, the best way to combat that would just have a really good site and have thousands of people who love it, so if 20 people go on there and seem to hate it, it doesn’t matter? JOHN MUELLER: I guess that makes sense, yeah I mean, I don’t see this as a theoretical issue I know there are people that try to do this all the time And that’s something that I wouldn’t worry about, from Google’s point of view These kind of activities– you’ve seen them on Mechanical Turk You’ve seen them on Fiber, for example That’s something that’s been happening since a really long time So I wouldn’t call it too far-fetched But at the same time, I wouldn’t say that that’s something you need to worry about with regard to Google or probably any of the other search engines BARUCH LABUNSKI: Well, there’s security stuff you can use as well, Josh, for those kind of stuff JOHN MUELLER: Yeah I think there’s some kind of, like, plug-ins that you can use to catch those kind of activities But essentially, if someone is offering this as a service where they’re saying, well, I’ll pay you, I don’t know, 10 cents for every site that you visit on the internet, then that’s something that they’d be doing with their normal browser Then that’s something that’s really hard to catch from your site directly BARUCH LABUNSKI: No, I’m saying in terms of infrastructure protection, there are services out there that can go directly into your server and figure out, OK, this is a bad bot or a bad user that keeps on coming And it can recognize So there are solutions for this, Josh JOHN MUELLER: Yeah Sure That’s a possibility as well But from Google’s point of view, this is definitely not something I’d worry about JOSH: OK Thanks, John JOHN MUELLER: All right Let’s grab some more from the Q&A “Why would an incorrect page rank for a search term and not the most optimized one? Links pointing to it, keyword stuffing, Google isn’t ranking the right pages for some search terms, and ranking affiliated pages instead.” I’d probably have to take a look at some of the examples to see what specifically you’re looking at there Within the websites, sometimes it happens that we can’t recognize what a specific page is about, or maybe it looks like a page is kind of artificially inflating its importance with regard to specific terms So that could be from keyword stuffing on those pages A lot of times, we also just see technical issues within a website, or maybe the internal linking structure isn’t as clear as it could be, those kind of things But if you have specific examples, I’m happy to take a look at that to see what we should be doing better there, what we might want to tweak in our algorithms to recognize that a little bit better MALE SPEAKER: John, related to that question, with all the keywords basically pretty much disappeared from any statistics, how do you determine which pages people are going to wrongly through keywords? For example, on mine, if you type Virtual Office London, you go to the home page instead of our London page So first of all, how do I recognize those pages now I can’t use keyword terms? And second of all, how do I then deal with that situation? JOHN MUELLER: I mean, that’s something where you see a lot of information in Webmaster Tools, at least the keywords that people are searching for, which you can also try out directly to see what they might be seeing on search So that’s probably what you did there You know that these are keywords people are searching for, so you went to Google CO UK and said, I’m looking for whatever, London And you probably saw that the wrong page is showing up So that’s something that’s kind of an iterative process

there, which I think is kind of normal in this case It’s not something where there’s really a direct one-to-one mapping that we’d provide With some keywords, when we see a lot of activity in Webmaster Tools, we’ll also show the landing pages that we found there, that they’re ranking as well We’ll show that in Webmaster Tools We don’t show that for all keywords or all keyword combinations, because sometimes we just don’t have enough data to actually provide useful information there for you So that’s something where to some extent, you’ll see that there; to some extent, you have to try that out With regard to what to do when you see that it’s showing the wrong page, that’s always a bit harder So essentially, what you’ll want to do there is kind of the same as you would with any kind of SEO activity, is make sure that your technical foundation is as good as it should be So the internal links are working properly, the content is essentially set up correctly within the website And then work to make sure that the quality side is correct as well, so that your London page, for example, isn’t just keyword-stuffed London text or copies of Wikipedia text I know this isn’t the case with your site specifically, but sometimes we see those things on the web So those are usually the kind of situations there If this is something that’s completely wrong on our side, then that’s sometimes useful to bring back to the search engineers as well where we say, for this specific keyword combination, we’re showing the home page of this site when clearly, the homepage is on a much, much more general area than actually what the user was searching for And to send people to the home page would be to provide them a big disservice So that’s the kind of situation where we would say, we should take a look at that on our algorithmic side, talk to the search engineers, and make sure that we’re getting that a little bit better MALE SPEAKER: Yeah I mean, that’s generally what I feel is happening, because it’s just not a good situation for our customers to land on our home page, especially if we can’t detect the keywords now to make sure that we can maybe display something to them to say, we can see that you’re looking for London, so the first item we’re going to show you is London-related, so we can point you in the right direction We obviously can’t do that anymore And the London landing page, we know that’s what they’re looking for So yeah, it’s kind of a difficult situation for us It’s a bad user experience, because we cater for around 150 different locations around the world So we know that the home page is too generic for it, and we’re probably losing out on sales and disappointing customers And now we’re ranking well in Google We’re in, like, third place Now it’s not entirely accurate And if some search user group of yours actually goes through the search results and says, is this a quality site to be pointing to people to, well, what if you haven’t looked at everybody else’s landing page for London? Theirs is good, and ours isn’t And we could be demoted on that basis Yet our London page is actually excellent JOHN MUELLER: Well, that’s not something where we’d demote a site for So specifically, the kind of search results reviews that we do, that’s something that we do more to fine-tune our algorithms and make sure that the algorithms are working right So if we were to recognize that we’re showing your general page instead of the London-specific one, then that’s something where, in the worst case, if someone from the Search Quality team were reviewing that search result, they’d say, we need to work on this search result So it’s not that your site is bad, it’s essentially our algorithm that’s bad in a case like that So we don’t go through the search results on that basis and say, well, this site really matches those search results, we’re going to make sure it always stays number one We’re essentially just testing our algorithms and making sure that we’re getting those improvements right And a lot of times, what happens with our algorithms there is that we’ll take two search results pages and say, this is with maybe one version of the algorithm, this is more with a different version of the algorithm, or with an algorithm without this algorithm And we’ll have reviewers go through those pages and say, this is good, or this is bad, or these results are better, and this is why, so that we can fine-tune that algorithm So it’s not that we tweak the sites out of the search results, but make sure that we’re showing the right pages there BARUCH LABUNSKI: So these Google reviewers, quality search reviewers, where are they from? Are they from around the world? Or are they specifically from California? JOHN MUELLER: We try to get them from around the world And that’s something where we try to do what we need to do to make people able to test these search results

So specifically, if we’re looking at pages in French, then we probably don’t want to take the average American who doesn’t really speak French to review those search results, because they wouldn’t be able to do that We really need to have native speakers who actually can look at that specifically BARUCH LABUNSKI: Well, you always have Google Translate JOHN MUELLER: Yeah I don’t know if we’d want to use Google Translate to review the quality of our search results I really like Google Translate, but sometimes it has some creative ideas BARUCH LABUNSKI: Right JOHN MUELLER: All right Let’s run through some more of these questions in the Q&A and see if we have some time afterwards “Can we expect any more and often updates in Webmaster Tools? For example, the search queries, click-through rates, more examples from leaked domains?” Yes, we’re working on improving the quality of the data and the features there, specifically around search queries at the moment So I’m hoping, at some point in the future– maybe, I don’t know It’s hard to say Early next year, I guess, we could see more there “Does Google always treat No Follow links as No Follow? Or does it act as a suggestion similar to what the canonical tag does?” In general, the No Follow is something that we use as a technical method So it’s something that we try to follow as much as possible So it’s not something that we’d use as a signal But at the same time, we still kind of reserve the right to act upon any abusive issues that we run across I’m not aware of any kind of abusive issues around the No Follow that we’ve seen in the past there But we try to keep that door open, so that if we recognize someone doing really, really sneaky things with this or with any of the other methods that we have available, then we’ll still reserve the right to take action on that and try to keep our search results clean in that regard So primarily, it’s a technical tool, and we treat one to one We essentially drop those links completely from our link graph I just want to keep that one disclaimer that there might be some places where we kind of have to take action on this even if we do drop those links “Can medical services and therapies have reviews for rich snippets?” Sure, that’s essentially open That’s not limited to any specific kind of product or service “I collect reviews for my services on one page on the website The reviews have no ratings [INAUDIBLE] stars It’s essentially just text Can I use this page for rich snippets and ratings?” Essentially, for review-rich snippets, we want to have one clear page that’s about one specific product or service And that’s essentially what you can use for rich snippets there So if you have one page on your website that’s for everything that you offer, then that’s probably not so useful for rich snippets MALE SPEAKER: John, generally, you wouldn’t want to see rich snippets in general on the home page? JOHN MUELLER: For most sites, we wouldn’t expect to see that on the home page, because the home page is about the business in general There can be situations where maybe the home page is about a specific product or service, and then it might make sense to put that on there But for most businesses, you have a general home page, and the services and products are somewhere else from the home page “How to optimize an image for SEO Is having a keyword in the file name and useful alt tag all that’s needed?” I think both of those are good things to have Another good aspect is to have some kind of a subtitle on those pages And finally, one more aspect to keep in mind is that the clearer it is that this image is about a specific topic, the easier we can actually rank it for that So if you have a page with hundreds of thumbnails on there, and they have a subtitle [? and an alt ?] text, then that’s really hard for us to pick up on Whereas if you have one specific landing page per image, then that’s often a lot easier for us to recognize that this image is about this specific topic “If a website is hit by Panda, cleans up, and fixes everything, what’s the next step? Should we submit our URL for verification, or just have to wait until the next Panda to regain our rankings?” Essentially, working to really clean up your website’s content, the quality of your website overall, is really important there You don’t need to do anything technical past that So you don’t need to do a reconsideration request, which, as far as I know, you can’t do for algorithmic issues anyway You don’t need to request the URLs to be re-crawled

But you could do that, for example, if you wanted to have that update a little bit faster in search For issues like Panda, which are essentially site-wide issues, usually, the crawling of individual pages is not going to make a big difference So really make sure that your website is the highest quality it can be overall And essentially, we’ll pick up on that data over time and take that into account One thing to keep in mind, specifically with regards to our quality algorithms, is we’re not just looking to see if the text is unique and say, well, this is unique text, therefore, it must be high quality We really want to see the whole website overall as being high quality So it’s not just enough to say, well, this text, if I copy and paste it into Google, doesn’t show any other matches Therefore, it’s great-quality content We really want to see that there’s more behind it than just unique text Unique text is easy to create A high-quality website, it takes a bit of work, and it takes a bit of analysis of what users are actually trying to do on your website, and making sure that your website matches what their expectations So that’s something that’s really hard to define, and not really a technical issue where you could say, well, I fixed up the quality of my website, therefore, Google will take that into account immediately It’s not that black and white So that’s something where taking a step back, getting more people’s opinions can make really big difference “I have too many reviews for one page but want to send the count on the home page a note [? say ?] the page must also contain review markup for each reviewed item Count should only include reviews on your own site It implies review count for the site, not the page.” I’d have to take a look at the rich snippet guidelines, specifically with regard to that I don’t really know the answer to that I know some sites have this paginated reviews system where you have one main review page, and then you could paginate through the individual reviews From my point of view, I think that’s fine But I’d have to take a look at the [? rich snippet ?] guidelines to really double-check that MALE SPEAKER: John, did you do a Hangout about hreflang and the HTTPS? You said you were going to try and make it clearer for everybody I don’t remember I may have missed a whole display You said you were going to put a slide together for it JOHN MUELLER: I don’t remember [LAUGHS] Too many Hangouts Yeah MALE SPEAKER: It would have been in the last two Hangouts if you did But I might have missed it But you mentioned it about three or four weeks ago JOHN MUELLER: I don’t remember So that’s probably another one MALE SPEAKER: Yeah Could you possibly post something on your Google+ as maybe a drawing of some sort from somewhere that explains exactly– if you have HTTPS and you have hreflang and any canonical whatever issues, how to exactly structure that? JOHN MUELLER: Sure MALE SPEAKER: Yeah That would be very helpful JOHN MUELLER: Sure MALE SPEAKER: Hey, John You mentioned about Panda I know that at some point, you guys had announced that you were doing updates on a frequent basis It was part of the daily processing, if you will It appears that now, they have moved to more one-off updates like Penguin Is that a fair assessment? Then, if yes, do you guys have any plans of moving Panda back into a rolling schedule, so that Webmasters don’t have to wait for the next refresh to see improvement? JOHN MUELLER: One thing we did there is essentially update the algorithm, so that’s one aspect there But it should be fairly regular now So it shouldn’t be something where I’d say you’d have to wait six months before an update there I know with the previous Panda, as well, we’ve been running updates a lot more frequently So that’s something where I imagine, in this case as well, we’ll just be rolling these more frequently And this was a bigger step, because there were some bigger changes there MALE SPEAKER: Good But it’s still not part of the daily processing– and so, like, for example, you would still not expect to see some sites recovering until you have an official refresh JOHN MUELLER: I don’t think we’d announce those kind of refreshes, because they just happen so regularly So it’s not something– I don’t think you’d count on it as being daily

[INAUDIBLE] we’d say this happens every six months, therefore, it’s a big jump It’s more regular than that But I don’t have any specific time frame where I can say, you should expect these changes to happen weekly or monthly That’s something where I know the engineers like to have a little bit of flexibility in that when they can roll it out more frequently When they can just have this data update automatically, they’ll try to do that And sometimes it takes a little bit longer to review bigger changes And they like to have the flexibility of being able to do that MALE SPEAKER: Got it JOHN MUELLER: So that’s not something where I can promise any specific time frame But I know the regular updates is definitely one of the things they’re trying to do there MALE SPEAKER: Got it OK So even if you guys do not change the algorithm, you will still try to re-run it as often as you can? JOHN MUELLER: Yeah MALE SPEAKER: OK Thank you That’s helpful JOHN MUELLER: “I’m an author of a book which has reviews on Amazon Is there any way to use them as my author website in rich snippets? What about reviews I get via email, which I publish on my author site?” So if these reviews were submitted somewhere else, then that’s not something we’d want to see you reuse for rich snippets So don’t scrape the reviews from Amazon and put them on your website We really want the unique reviews on your website itself to be marked up for rich snippets So it should be content that’s unique to your website With regards to reviews that you get sent in by email, one of the difficulties there is, of course, mapping that to any kind of a star rating If someone sends you an email, says, “hey, this was a great book, I really enjoyed it,” how do you map that into a star rating without skewing it in your favor? And how do you pick and choose those emails and say, well, these five people that really loved my book, I’ll put them on my website and use them for rich snippets And those five people that didn’t like my book, I’ll just assume that I never received those emails and kind of ignore them So when you’re filtering things like that, that’s something where I worry that rich snippet markup isn’t really the best choice there So bringing them onto your website is a great idea Marking them up for rich snippets, probably not so much OK We have, oh, one minute left Well, let’s see if we have any other emails MALE SPEAKER: John, could I quickly ask you about my [INAUDIBLE] UK site, hreflang site, and Webmaster Tools shows 20 pages indexed But there’s thousands of pages if you use the site query Is something wrong with Webmaster Tools? JOHN MUELLER: Hopefully not Otherwise, I’ll have to talk to them this evening when I have a meeting with them Where are you looking? Are you looking at the site map’s count? MALE SPEAKER: Yes I go into the site map page, and it says it’s got 8,000 pages that it’s obviously got from my site map But it only shows 20 pages are actually indexed But that’s not true But I wonder whether or not there was some complication because it was an hreflang, not the original, and maybe some swaps happening or something But the site query shows something very different JOHN MUELLER: hrefland should be irrelevant there So that shouldn’t matter But specifically, with regard to site maps, index count, one thing to keep in mind is the site map’s count looks at the very exact URL that you have in your site map file So if there’s a slightly different URL that’s actually indexed, we won’t count that for your site map file And sometimes, we’ll see that with, like, URL parameters; sometimes with [? dot, dot, dot, ?] with [? non-dot, dot, dot; ?] with upper, lowercase in the URL; dashes or underscores where maybe your CMS interchangeably uses those, but you submit one version in the site map, and a slightly different version is indexed And then we say, well, that URL that you submitted is not actually indexed Therefore, we’re not going to count it So what I tend to do in cases like that is try to break the site map file down So break it up, maybe into 5 or 10 site map files that are easier to look at, and submit those in Webmaster Tools And if you find a site map file that has zero indexed URLs, then take a few of those sample URLs in the site map file and do something like an info query in Google, like just info, colon, and then the URL And see which URL is actually indexed for that content So we’ll probably have crawled that URL and indexed the content there, but maybe we picked a different one to index the content under And with an info query, you’ll quickly see that And it sounds like maybe there’s a systematic issue with your site map files, maybe something simple like a trailing slash or no trailing slash Those are the kind of things where site maps would say, well, this isn’t the exact URL, maybe you’re

indexing the wrong thing But from your point of view, you’d say, well, this is good enough And maybe that’s a good sign to update the site map file, or to look at the internal links on your website, and say, well, internally, I’m linking to this version of the URL, I’ll match it to– point to the version that I have in my site map files So it’s probably not a critical issue But it’s usually something worth cleaning up, so that you can look at this data in a little bit more of an easier to understand way MALE SPEAKER: Do you have a couple more seconds there, John? JOHN MUELLER: Sure MALE SPEAKER: Matt Cutts mentioned something interesting at the last SMX Advanced He mentioned that your web forms should have auto-complete if possible He was talking about mobile, I believe But I was wondering, how important is it to have a streamlined– not like 10 questions on the web form, but as few fields to fill in as possible and auto-completed? How important is that for your quality algorithms? JOHN MUELLER: I don’t think we use that at all for search So– BARUCH LABUNSKI: He was just mentioning that desktop and mobile people get lazy I remember that, yeah JOHN MUELLER: Yeah, I don’t think we use that at all for search What we are looking at is generally, if a page has a mobile version or not And that’s something that we’d like to reflect in search, where we’d like to treat that appropriately in the search results But if it’s a mobile-friendly version and it happens to have 50 fields for a contact form, then that’s essentially your loss That’s not something where we’d say, well, this is a bad search result It’s essentially someone going to your website and not being able to follow through and complete whatever task they were trying to do And essentially, this is a client coming into your store, and you make it so hard for them to actually buy anything from you Then it’s your loss It’s not something where we’d say, this is a problem from search specifically But we send them to your site They kind of like the content on your site, but they’re not able to follow through And that’s more your problem than our problem MALE SPEAKER: OK, great JOHN MUELLER: All right So let’s take a break here I think the next Hangout we have planned on general best practices for 2014 to see what things have changed over time And the next Monday Hangout after that, we’ll have someone join us from the Google News team So if you’re a Google News publisher, be sure to join us BARUCH LABUNSKI: Barry Schwartz? JOHN MUELLER: Yeah, sure Go for it, yeah OK So with that, I’d like to thank you all your questions and for your time It’s been really insightful Lots of good feedback as well I’ll definitely take the spam report that I got there and pass that onto our team, make sure that we’re looking at that properly, and double-check my notes to see if I’m missing anything else So I hope you guys have a great week, and maybe see you guys again MALE SPEAKER: Thank you, John Bye-bye MALE SPEAKER: Thanks, John Have a good meeting this afternoon MALE SPEAKER: Thanks, John JOHN MUELLER: Bye BARUCH LABUNSKI: Bye, John