Issue
I am working on crawling a webpage. A section of the source code of the page, is below:
<div class="accordion-row">
<h4 class="accordion-title down-arrow">The Problem</h4>
<div class="accordion-content">
<p>
Solving the climate crisis needs us to collaborate at all levels. Climate impacts are generally associated with heavy industries, such as cement, aviation and, of course, energy. But transformation needs all of us, and the creative sector – worth £111.7 billion to the UK in 2018, equivalent to £306 million every day, and £166 billion in the USA – has impacts just as every other sector does, and of course, its influence is immeasurable.<br />
<br />
When JB was founded it was clear that, in order to reduce greenhouse gas emissions, collective ambition needed a common starting whistle, a baseline and a sound roadmap. With no environmental impact data available for the arts (a problem which persists today in most countries) it was not possible to start this journey. Therefore, advocating the relevance of the climate crisis to culture, and building the tools and resources to take action was critical. We started by:<br />
<br />
1. gathering data on greenhouse gas emissions with tolls co created with culture, collected at scale and over time;<br />
2. championing case studies, solutions and stories of change;<br />
3. building networks, knowledge-chains for the creative community, with events, training, projects and research programs;<br />
4. producing culturally specific resources. <br />
<br />
These foundations have evolved into a rich program that is now exploring creative climate leadership in order to scale the unmet potential of culture in all its manifestations, championing artists particularly through the lens of climate justice and focusing on cultural policy-making as a rapid agent of change. All the data, both quantitative and qualitative, is proving invaluable.<br />
<br />
The climate crisis is a cultural crisis, in that it reflects our deepest values and identities. Environmental injustice is a growing problem and reflects cultural values that have, too often, championed human – and white - supremacy over all else; and therein lies the opportunity. Culture – artists, theatre makers, festival organizers, galleries and museums, poets and story-tellers – should be at the center of climate action, reframing the stories we tell ourselves and offering visions of a regenerative world in tune with the needs of nature and community. One of JB’s most enduring partnerships with a consortium of cultural organizations in Manchester has resulted in culture recognized as vital for the city’s ambitious climate targets. This is a rare recognition of culture’s potential.
</p>
</div>
</div>
<div class="accordion-row">
<h4 class="accordion-title down-arrow">The Strategy</h4>
<div class="accordion-content">
<p>
Julie’s Bicycle has at its core a free resource base for anyone in the world to use. The first is a set of carbon calculators co-produced by the UK cultural sector which provide a snapshot of, for example, the carbon footprint of tours, productions, buildings etc. These are used throughout the program for benchmarking and planning. These tools have been used by 5000 organizations across 50 countries. The Creative Green program offers consultancy, environmental reporting, and the world’s first (though now not only) Green Certification for artistic organizations. The Green Tools are a starting point and Alison is developing a much broader, more holistic set of measurement tools, including science-based target tools, to provide to organizations. The Creative Green consultancy is an engine that helps to power the unfunded leadership work – advocacy and new ideas on the front lines of change, the ‘profit maker’ raising money through consultancy fees and royalties to fund policy shift and systems change. <br />
<br />
These Creative Green tools provide a rich source of data and ‘intelligence’ on thousands of organizations and their procedures, progress, and approaches that inform all of Julie’s Bicycle’s policy work and future initiatives – a core part of their effectiveness. Any policy work is tracked, costed, and disseminated. Her research has been peer-reviewed by Oxford University.<br />
<br />
Julie’s Bicycle has worked with many cities to make the links between culture and climate policy clear, and build culture into sustainability strategies. The Manchester Arts Sustainability Team (MAST) found in a study that working with arts and young people was the most effective way to change behavior towards climate change. Together with Julie’s Bicycle, their strategy is now being replicated in six other cities and has received an EU grant to replicate in six European countries.<br />
<br />
She advocates for cultural policy change on a local, national, and international policy level to include culture in climate change plans. Julie’s Bicycle has a pioneering partnership with Arts Council England, which has required its core grant beneficiaries (currently around 800) to report annually on environmental impacts and have a policy. This requirement, in pace since 2012, has evolved and now includes the Accelerator Programme – 10 innovative projects led by organizations; and the Spotlight Programme, developing science-based targets with 30 of the 60 organizations that produce the majority of carbon emissions. Arts Council England’s policy has catalyzed an overall reduction of 41% of CO2 across the portfolio of 850 organizations, and a cost savings of around £16-20 million. Other national Arts Councils are looking at this model, and JB has presented, especially recently, to many. And beyond impacts alone, climate and the environment more generally are issues of huge concern and interest – indeed, over half of Arts Council England’s portfolio is now creating or commissioning work related to climate and climate justice.<br />
<br />
Finally, she engages with artists and cultural leaders in centering climate work in their creative practice and entrepreneurship. An intensive Creative Climate Leadership program for artists and producers, and commissioning and events on climate justice, aims to build a global network of cultural changemakers. She is keen to build capacity within the creative community and lift up artistic voices as agents of change. <br />
<br />
Alison’s work has catalyzed a powerful industry, and is using the talents and power of that industry – the influential voice of cultural producers and artists – to change the narrative on climate, as well as change national and international climate strategy. She is working with the biggest institutions in government and culture to change their own processes. She has partnered, or supported, sister initiatives in fashion, media and film industries.<br />
<br />
To scale, she is planning to invest £700,000 to update the digital tools and resources and make them relevant for different country contexts.<br />
<br />
Alison also has a keen focus on moving further internationally. She is in talks with potential partners in Ireland, Canada, Spain, Denmark, and Germany, and plans to license the tools to international cultural organizations, and launch an international Creative Green Kitemark. She wants to build an internationally tested model for a Green Deal that has policy influence at COP26 and beyond, publishing a public policy toolkit using her decade of data and findings. Her goal is also to activate a global network of cultural changemakers through her Creative Leadership programs, commissions, and other supports. Eventually, her vision is a series of regional hubs around the world, collaborating on change.
</p>
</div>
</div>
To crawl it, I have used this line in my code: introduction = response.css('.accordion-content').extract()
This works and crawls the data. However, it crawls it all in one go.
What I would ideally like is to crawl the sections in the accordion class separately. So for instance, I would like to crawl the paragraph that starts with -
<h4 class="accordion-title down-arrow">The Problem</h4>
seperately, and the one that starts with
<h4 class="accordion-title down-arrow">The Strategy</h4>
separately. The reason for this is that we only require "The Strategy" section, and not all the sections.
I hardly use CSS, so I'm unsure how to specify this selector so that the crawler only crawls the required paragraph.
Does anyone have an idea?
Solution
extract() will return a list, so "The Problem" paragraph is introduction[0], and "The Strategy" paragraph is introduction[1].
If you want to scrape them separately you can use this:
problem_paragraph = response.css('div.accordion-row:nth-child(1) > div').get()
strategy_paragraph = response.css('div.accordion-row:nth-child(2) > div').get()
You'll get the text including the <br>
tag.
In order to get only the text in each paragraph (without any tags) you can use xpath with string():
problem_paragraph = response.xpath('string((//div[@class="accordion-content"])[1]/p)').get()
strategy_paragraph = response.xpath('string((//div[@class="accordion-content"])[2]/p)').get()
Answered By - SuperUser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.